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Foreword 



ISO (the International Organization for Standardization) and IEC (the Inter- 
national Electrotechnical Commission) form the specialized system for 
worldwide standardization. National bodies that are members of ISO or 
IEC participate in the development of International Standards through 
technical committees established by the respective organization to deal 
with particular fields of technical activity. ISO and IEC technical com- 
mittees collaborate in fields of mutual interest. Other international organ- 
izations, governmental and non-governmental, in liaison with ISO and IEC, 
also take part in the work. 

In the field of information technology, ISO and IEC have established a joint 
technical committee, ISO/IEC JTC 1. Draft international Standards adopted 
by the joint technical committee are circulated to national bodies for vot- 
ing. Publication as an International Standard requires approval by at least 
75 % of the national bodies casting a vote. 

International Standard ISO/IEC 11172-1 was prepared by Joint Technical 
Committee ISO/IEC JTC 1, Information technology, Sub-Committee SC 29, 
Coded representation of audio, picture, multimedia and hypermedia infor- 
mation. 

ISO/IEC 11172 consists of the following parts, under the general title In- 
formation technology — Coding of moving pictures and associated, audio 
for digital storage media at up to about 1,5 Mbit/sr. 

— Part 1: Systems 

— Part 2: Video 

— Part 3: Audio 

— Part 4: Compliance testing 

Annexes A and B of this part of ISO/IEC 1 11 72 are for information only. 
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Introduction 



Note - Readers interested in an overview Of the MPEG Systems layer should read this Introduction and then 
proceed to annex A, before returning to the clauses 1 and 2. Since the system target decoder concept is 
referred to throughout both the normative and informative clauses of this part of ISO/EEC 1 1 172, it may 
also be useful to refer to clause 2.4, and particularly 2.4.2, where the system target decoder is described. 



The systems specification addresses the problem of combining one or more data streams from the video and 
audio parts of this International Standard with timing information to form a single stream. Once combined 
into a single stream, the data are in a form well suited to digital storage or transmission. The syntactical 
and semantic rules imposed by this systems specification enable synchronized playback without overflow or 
underflow of decoder buffers under a wide range of stream retrieval or receipt conditions. The scope of 
syntactical and semantic rules set forth in the systems specification differ the syntactical rules apply to 
systems layer coding only, and do not extend to the compression layer coding of the video and audio 
specifications; by contrast, the semantic rules apply to the combined stream in its entirety. 

The systems specification does not specify the architecture or implementation of encoder or decoders. 
However, bitstream properties do impose functional and performance requirements on encoders and decoders. 
For instance, encoders must meet minimum clock tolerance requirements. Notwithstanding this and other 
requirements, a considerable degree of freedom exists in the design and implementation of encoders and 
decoders. 

A prototypical audio/video decoder system is depicted in figure 1 to illustrate the function of an ISO/IEC 
1 1 172 decoder. The architecture is not unique - System Decoder functions including decoder timing control 
might equally well be distributed among elementary stream decoders and the Medium Specific Decoder - but 
this figure is useful for discussion. The prototypical decoder design does not imply any normative 
requirement for the design of an ISO/IEC 11 172 decoder. Indeed non-audio/video data is also allowed, but 
not shown. 




Figure 1 - Prototypical ISO/IEC 11172 decoder 

The prototypical ISO/IEC 1 1 172 decoder shown in figure 1 is composed of System, Video, and Audio 
decoders conforming to Parts 1, 2, and 3, respectively, of ISO/IEC 11172. In this decoder the multiplexed 
coded representation of one or more audio and/or video streams is assumed to be stored on a digital storage 
medium (DSM), or network, in some medium-specific format The medium specific format is not governed 
by this International Standard, nor is the medium-specific decoding part of the prototypical ISO/IEC 1 1 172 
decoder. 

The prototypical decoder accepts as input an ISO/IEC 1 1 172 multiplexed stream and relies on a System 
Decoder to extract timing information from the stream. The System Decoder demultiplexes the stream and 
the elementary streams so produced serve as inputs to Video and Audio decoders, whose outputs are decoded 
video and audio signals. Included in the design, but not shown in the figure, is the flow of timing 
information among the System Decoder, the Video and Audio Decoders, and the Medium Specific Decoder. 
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The Video and Audio Decoders are synchronized with each other and with the DSM using this limine 
information. ©5 

ISO/IEC 1 1 172 multiplexed streams are constructed in two layers: a system layer and a compression layer 
The input stream to the System Decoder has a system layer wrapped about a compression layer Input 
streams to the Video and Audio decoders have only the compression layer. 

Operations performed by the System Decoder either apply to the entire ISO/EC 1 1 172 multiplexed stream 
( multiplex-wide operations ), or to individual elementary streams ("stream-specific operations") The 
ISO/IEC 1 1 172 system layer is divided into two sub-layers, one for multiplex-wide operations (the oack 
layer), and one for stream-specific operations (the packet layer). F 

0.1 Multiplex-wide operations (pack layer) 

Multiplex-wide operations include the coordination of data retrieval off the DSM, the adjustment of clocks 
and the management of buffers. The tasks are intimately related. If the rate of data delivery off the DSM is 
controllable, then DSM delivery may be adjusted so that decoder buffers neither overflow nor underflow 
but if the DSM rate is not controllable, then elementary stream decoders must slave their timing to the* 
DSM to avoid overflow or underflow. 

ISO/IEC 1 1 172 multiplexed streams are composed of packs whose headers facilitate the above tasks Pack 
headers specify intended times at which each byte is to enter the system decoder from the DSM, and this 
target arrival schedule serves as a reference for clock correction and buffer management. The schedule need 
not be followed exactly by decoders, but they must compensate for deviations about it 

An additional multiplex-wide operation is a decoder's ability to establish what resources are required to 
decode an ISO/EEC 1 1 172 multiplexed stream. Hie first pack of each ISO/IEC 1 1 172 multiplexed stream 
conveys parameters to assist decoders in this task. Included, for example, are the stream's maximum data 
rate and the highest number of simultaneous video channels. 

0.2 Individual stream operations (packet layer) 

The principal stream-specific operations are 1) demultiplexing, and 2) synchronizing playback of multiple 
elementary streams. These topics are discussed next 

0.2.1 Demultiplexing 

On encoding, ISO/IEC 11172 multiplexed streams are formed by multiplexing elementary streams. 
Elementary streams may include private, reserved, and padding streams in addition to ISO/IEC 11 172 audio 
and video streams. The streams are temporally subdivided into packets, and the packets are serialized A 
packet contains coded bytes from one and only one elementary stream. 

Both fixed and variable packet lengths are allowed subject to constraints in 2.4.3.3 and in 2.4.5 and 2.4.6. 

On decoding, demultiplexing is required to reconstitute elementary streams from the ISO/IEC 11172 
multiplexed stream. This is made possible by stream_id codes in packet headers. 

0.2.2 Synchronization 

Synchronization among multiple streams is effected with presentation time stamps in the ISO/IEC 11172 
multiplexed stream. The time stamps are in units of 90kHz. Playback of N streams is synchronized by 
adjusting the playback of all streams to a master time base rather than by adjusting the playback of one 
stream to match that of another. The master time base may be one of the N decoders' clocks, the DSM or 
channel clock, or it may be some external clock. 

Because presentation time-stamps apply to the decoding of individual elementary streams, they reside in the 
packet layer. End-to-end synchronization occurs when encoders record time-stamps at capture time, when 
the time stamps propagate with associated coded data to decoders, and when decoders use those time-stamps 
to schedule presentations. * 

Synchronization is also possible with DSM timing time stamps in the multiplexed data stream. 
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0.2.3 Relation to compression layer 

Hie packet layer is independent of the compression layer in some senses, but not in all. It is independent in 
the sense that packets need not start at compression layer start codes, as defined in parts 2 and 3. For 
example, a video packet may start at any byte in the video stream. However, time stamps encoded in 
packet headers apply to presentation times of compression layer constructs (namely, presentation units). 

0.3 System reference decoder 

Part 1 of ISO/IEC 11172 employs a "system target decoder/ (STD) to provide a formalism for timing and 
buffering relationships. Because the STD is parameterized in terms of fields defined in ISO/IEC 11172 (for 
example, buffer sizes) each ISO/IEC 1 1 172 multiplexed stream leads to its own parameterization of the 
STD. It is up to encoders to ensure that bitstreams they produce will play in normal speed, forward play on 
corresponding STDs. Physical decoders may assume that a stream plays properly on its STD; the physical 
decoder must compensate for ways in which its design differs from that of the STD. 
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Information technology -r Coding of moving 
pictures and associated audio for digital storage 
media at up to about 1,5 Mbit/s — 

Part 1: 

Systems 

Section 1: General 

1.1 Scope 

This part of ISO/EC 1 1 172 specifies the system layer of the coding. It was developed princioallv to 

1 1^T^T? ° f °* VldC ° and audi0 COdin S roelhods » ISOffl^^SSSoSBC 

11172-3. The system layer supports five basic functions: • 

a) the synchronization of multiple compressed streams on playback 

b) the interleaving of multiple compressed streams into a single stream 

c) the initialization of buffering for playback start up, 

d) continuous buffer management, and 

e) time identification. 

£H5 " 172 mulUpl t xed bil stream * ^^ted in two layers: the outennost layer is the system 
layer, and the innermost is the compression layer. The system layer provides the functions ne^sarv foT 
° f m ° re *■* stream * * a system .The video S audio par* TZ^Son 

T 1 ^' 011 COdmg f^f for audi0 and Coding of other \y£s of data S Sied by 

^Kn^^ 

1.2 Normative references 

The following International Standards contain provisions which, through reference in this text, constitute 

SSSSKfffS? ° f y?* 0 ? 11Z At ^ ** ° f pub,ication ' * e nidi were 3d M 

standards are subject to revision, and parties to agreements based on this part of ISO 1 1 172 are encouraged 

V 7Sff y • f 8PP,ying m0St recent «"** of *■ ***** indicated tetow 8 
Members of EC and ISO maintain registers of currently valid International Standards. 

CCIR Recommendation 601-2 Encoding parameters of digital television for studios. 
CCIR Report 624-4 Characteristics of systems for monochrome and colour television. 
CCIR Recommendation 648 Recording of audio signals. 
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OQTT Recommendation J.17 Pre-emphasis used on Sound-Programme Circuits. 

IEEE Draft Standard PI 180/D2 1990 Specification for the implementation of&x 8 inverse discrete cosine 
transform''. 

IEC publication 908:1987 CD Digital Audio System. 
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Section 2: Technical elements 
2.1 Definitions 

For the purposes of ISO/IEC 1 1 172, the following definitions apply. If specific to a part, this is noted in 
square brackets. 

24.1 ac coefficient [video]: Any DCT coefficient for which the frequency in one or both dimensions 
is non-zero. 

2.13, access unit [system]: In the case of compressed audio an access unit is an audio access unit In 
the case of compressed video an access unit is the coded representation of a picture. 

2.13 adaptive segmentation [audio]: A subdivision of the digital representation of an audio signal 
in variable segments of time. 

2.1.4 adaptive bit allocation [audio]: The assignment of bits to subbands in a time and frequency 
varying fashion according to a psychoacoustic model. 

2.1.5 adaptive noise allocation [audio]: The assignment of coding noise to frequency bands in a 
time and frequency varying fashion according to a psychoacoustic model. 

2.1.6 alias [audio]: Mirrored signal component resulting from sub-Nyquist sampling. 

2.1.7 analysis filterbank [audio]: Filterbank in the encoder that transforms a broadband PCM audio 
signal into a set of subsampled subband samples. 

2.1.8 audio access unit [audio]: For Layers I and II an audio access unit is defined as the smallest 
part of the encoded bitstream which can be decoded by itself, where decoded means "fully reconstructed 
sound". For Layer III an audio access unit is part of the bitstream that is decodable with the use of 
previously acquired main information. 

2.1.9 audio buffer [audio]: A buffer in the system target decoder for storage of compressed audio data. 

2.1.10 audio sequence [audio]: A non-interrupted series of audio frames in which the following 
parameters are not changed: 

-ID 
-Layer 

- Sampling Frequency 

- For Layer I and II: Bitrate index 

2.1.11 backward motion vector [video]: A motion vector that is used for motion compensation 
from a reference picture at a later time in display order. 

2.1.12 Bark [audio]: Unit of critical band rate. The Bark scale is a non-linear mapping of the frequency 
scale over the audio range closely corresponding with the frequency selectivity of the human ear across the 
band. 

2.1.13 bidirectionally predictive-coded picture; B-picture [video]: A picture that is coded 
using motion compensated prediction from a past and/or future reference picture. 

2.1.14 bitrate: The rate at which the compressed bitstream is delivered from the storage medium to the 
input of a decoder. 

2.1.15 block companding [audio]: Normalizing of the digital representation of an audio signal 
within a certain time period. 

2.1.16 block [video]: An 8-row by 8-column orthogonal block of pels. 

2.1.17 bound [audio]: The lowest subband in which intensity stereo coding is used. 
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2.1.18 byte aligned: A bit in a coded bitstream is byte-aligned if its position is a multiple of 8-bits 
from the first bit in the stream. 

2.1.19 byte: Sequence of 8-bits. 

2.1.20 channel: A digital medium that stores or transports an ISO/IEC 11172 stream. 
2.1*21 channel [audio]: The left and right channels of a stereo signal 

2.1.22 chrominance (component) [video]: A matrix, block or single pel representing one of the 
two colour difference signals related to the primary colours in the manner defined in CCIR Rec 601. The 
symbols used for the colour difference signals are Cr and Cb. 

2.1.23 coded audio bitstream [audio]: A coded representation of an audio signal as specified in 
ISO/EC 11172-3. 

2.1.24 coded video bitstream (video]: A coded representation of a series of one or more pictures as 
specified in ISO/DEC 1 1 172-2. 

2.1.25 coded order [video]: The order in which the pictures are stored and decoded. This order is not 
necessarily the same as the display order. 

2.1.26 coded representation: A data element as represented in its encoded form. 

2.1.27 coding parameters [video]: The set of user-definable parameters that characterize a coded video 
bitstream. Bitstreams are characterised by coding parameters. Decoders are characterised by the bitstreams 
that they are capable of decoding. 

2.1.28 component [video]: A matrix, block or single pel from one of the three matrices Guminance 
and two chrominance) that make up a picture. 

2.1.29 compression: Reduction in the number of bits used to represent an item of data. 

2.130 constant bitrate coded video [video]: A compressed video bitstream with a constant 
average bitrate. 

2.1.31 constant bitrate: Operation where the bitrate is constant from start to finish of the compressed 
bitstream. 

, 2.132 constrained parameters [video]: The values of the set of coding parameters defined in 
2.4.3.2 of ISO/IEC 11172-2. 

2.1.33 constrained system parameter stream (CSPS) [system]: An ISO/IEC 11172 
multiplexed stream for which the constraints defined in 2.4.6 of this part of ISO/IEC 1 1 172 apply. 

2.134 CRC: Cyclic redundancy code. 

2.135 critical band rate [audio]: Psychoacoustic function of frequency. At a given audible 
frequency it is proportional to the number of critical bands below that frequency. The units of the critical 
band rate scale are Barks. 

2.136 critical band [audio]: Psychoacoustic measure in the spectral domain which corresponds to the 
frequency selectivity of the human ear. This selectivity is expressed in Bark. 

2.137 data element: An item of data as represented before encoding and after decoding. 

2.138 dc-coefficient [video]: The DCT coefficient for which the frequency is zero in both 
dimensions. 
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2.139 dc-coded picture; D-picture (video]: A picture that is coded using only information from 
itself. Of the DCT coefficients in the coded representation, only the dc-coefficients are present 

2.1.40 DCT coefficient: The amplitude of a specific cosine basis function. 

2.1.41 decoded stream: The decoded reconstruction of a compressed bitstream. 

2.1.42 decoder input buffer [video]: The first-in first-out (FIFO) buffer specified in the video 
buffering verifier. 

2.1.43 decoder input rate [video]: The data rate specified in the video buffering verifier and encoded 
in the coded video bitstream. 

2.1.44 decoder: An embodiment of a decoding process. 

2.1.45 decoding (process): The process defined in ISO/IEC 11172 that reads an input coded bitstream 
and produces decoded pictures or audio samples. 

2.1.46 decoding time-stamp; DTS [system]: A field that may be present in a packet header that 
indicates the time that an access unit is decoded in the system target decoder. 

2.1.47 de-emphasis [audio]: Filtering applied to an audio signal after storage or transmission to undo 
a linear distortion due to emphasis. 

2.1.48 dequantization [video]: The process of rescaling the quantized DCT coefficients after their 
representation in the bitstream has been decoded and before they are presented to the inverse DCT. 

2.1.49 digital storage media; DSM: A digital storage or transmission device or system. 

2.1.50 discrete cosine transform; DCT [video]: Either the forward discrete cosine transform or the 
inverse discrete cosine transform. The DCT is an invertible, discrete orthogonal transformation. The 
inverse DCT is defined in annex A of ISO/IEC 1 1 1 72-2. 

2.1.51 display order [video]: The order in which the decoded pictures should be displayed. Normally 
this is the same order in which they were presented at the input of the encoder. 

2.1.52 dual channel mode [audio]: A mode, where two audio channels with independent programme 
contents (e.g. bilingual) are encoded within one bitstream. The coding process is the same as for the stereo 
mode. 

2.1.53 editing: The process by which one or more compressed bitstreams are manipulated to produce a 
new compressed bitstream. Conforming edited bitstreams must meet the requirements defined in ISO/IEC 
11172. 

2.1.54 elementary stream [system]: A generic term for one of the coded video, coded audio or other 
coded bitstreams. 

2.1.55 emphasis [audio]: Filtering applied to an audio signal before storage or transmission to 
improve the signal-to-noise ratio at high frequencies. 

2.1.56 encoder: An embodiment of an encoding process. 

2.1.57 encoding (process): A process, not specified in ISO/IEC 11172, that reads a stream of input 
pictures or audio samples and produces a valid coded bitstream as defined in ISO/IEC 1 1 172. 

2.1.58 entropy coding: Variable length lossless coding of the digital representation of a signal to 
reduce redundancy. 

2.1.59 fast forward playback [video]: The process of displaying a sequence, or parts of a sequence, 
of pictures in display-order faster than real-time. 
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2.1.60 FFT: Fast Fourier Transformation. A fast algorithm for performing a discrete Fourier transform 
(an orthogonal transform). 

2.1.61 filterbank [audio]: A set of band-pass filters covering the entire audio frequency range. 

2.1.62 fixed segmentation [audio]: A subdivision of the digital representation of an audio signal 
into fixed segments of time. 

2.1.63 forbidden: The term "forbidden" when used in the clauses defining the coded bitstream indicates 
that the value shall never be used. This is usually to avoid emulation of start codes. 

2.1.64 forced updating [video]: The process by which macroblocks are intra-coded from time-to-time 
to ensure that mismatch errors between the inverse DCT processes in encoders and decoders cannot build up 
excessively. 

2.1.65 forward motion vector [video]: A motion vector that is used for motion compensation from 
a reference picture at an earlier time in display order. 

2.1.66 frame [audio]: A part of the audio signal that corresponds to audio PCM samples from an 
Audio Access Unit 



2.1.67 free format [audio]: Any bitrate other than the defined bitrates that is less than the maximum 
valid bitrate for each layer. 

2.1.68 future reference picture [video]: The future reference picture is the reference picture that 
occurs at a later time than the current picture in display order. 

2.1.69 granules [Layer U\ [audio]: The set of 3 consecutive subband samples from all 32 subbands 
that are considered together before quantization. They correspond to 96 PCM samples. 

2.1.70 granules [Layer III] [audio]: 576 frequency lines that carry their own side information. 

2.1.71 group of pictures [video]: A series of one or more coded pictures intended to assist random 
access. The group of pictures is one of the layers in the coding syntax defined in ISO/IEC 11172-2. 

2.1.72 Hann window [audio]: A time function applied sample-by-sample to a block of audio samples 
before Fourier u*ansformation. 

2.1.73 Huffman coding: A specific method for entropy coding. 

2.1.74 hybrid filterbank [audio]: A serial combination of subband filterbank and MDCT. 

2.1.75 IMDCT [audio]: Inverse Modified Discrete Cosine Transform. 

2.1.76 intensity stereo [audio]: A method of exploiting stereo irrelevance or redundancy in 
stereophonic audio programmes based oh retaining at high frequencies only the energy envelope of the right 
and left channels. 

2.1.77 interlace [video]: The property of conventional television pictures where alternating lines of 
the picture represent different instances in time. 

2.1.78 intra coding [video]: Coding of a macroblock or picture that uses information only from mat 
macroblock or picture. 

2.1.79 intra-coded picture; I-picture [video]: A picture coded using information only from itself. 

2.1.80 ISO/IEC 11172 (multiplexed) stream [system]: A bitstream composed of zero or more 
elementary streams combined in the manner defined in this part of ISO/IEC 1 1 172. 
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asjssss^ ,audio,s Any method ** expioits stereophonic irre,evance ° r 

2.1.82 joint stereo mode [audio]: A mode of the audio coding algorithm using joint stereo coding. 
2^83 layer laudioj: One of the levels in the coding hierarchy of the audio system defined in ISO/EEC 

2.1.84 layer (video and systems]: One of the levels in the data hierarchy of the video and ™t«n 
specificauons defined in this part of ISO/IEC 1 1 172 and ISO/IEC 11 172-2 X 

2.1.85 luminance (component) [video]: A matrix, block or single pel representing a monochrome 
reputation of the signal and related to the primary colours in the manner defined in Cci SSTSd 
symbol used for luminance is Y. ine 

Sil^!? Wd T 1: ^ f0Ur 8 by 8 W0Cks 0f luminance ** ^ *> corresponding 8 by 
8 blocks of chrominance data coming from a 16 by 16 section of the luminance component oftite picture 
Maaoblock is sometimes used to refer to the pel data and sometimes to the coded re^i ofS 

The usage is clear from the context 

?S t T£°S ) . l SSCT. Co ° vt "' 0 ° ° f " " Kli0 ^ *™ *" » "» 

Z2tt£2£JX3£SSr sysM " by whid ' " •* — " 
— {^^JS^^-sr - *• >— — - —i *- 

2.1.90 MDCT [audio]: Modified Discrete Cosine Transform. 

2^91 motion compensation [video]: The use of motion vectors to improve the efficiency of the 

22? r — ^ PrediCti ° n USCS m0ti0n vectore to P rovide «*£ hto *» past i y r fuut 
reference pictures contammg previously decoded pel values that are used to fonn the prediction errer signal. 

pioSs. n, ° UOn eStimati0D ,V,deo,: ^ P rocess of estimating motion vectors during the encoding 

i 1 ,So m f 0tl0n .K VeC ^, |vide0,: A - tw °- dim ensional vector used for motion compensation that provides 
an offset from the coordinate position in the current picture to ti^ coordinates in a refoen^icmre 

2.1.94 MS stereo [audio]: A method of exploiting stereo irrelevance or redundancy in stereoohonic 
audio programmes based on coding die sum and difference signal instead of the leS right Sels 

H2?JX£! ntra C °;, ing J vld | eo ! : Codin « of a nwcroblock or picture that uses information both from 
itself and from macroblocks and pictures occuning at other times. 

2.1.96 non-tonal component [audio]: A noise-like component of an audio signal. 

2.1.97 Nyquist sampling: Sampling at or above twice the maximum bandwidth of a signal. 

2.1.98 pack [system]: A pack consists of a pack header followed by one or more packets. It is a layer 
in thesystem coding syntax described in this partof ISO/IEC 11172. "usaiayer 

2.1.99 packet data [system]: Contiguous bytes of data from an elementary stream present in a packet 

^ZE2£:SE££ - MK used 10 convey — — — * 
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2.1.101 packet [system]: A packet consists of a header followed by a number of contiguous bytes 
from an elementary data stream. It is a layer in the system coding syntax described in this part of ISO/IEC 
11172. 

2.1.102 padding [audio]: A method to adjust the average length in time of an audio frame to the 
duration of the corresponding PCM samples, by conditionally adding a slot to the audio frame. 

2.1.103 past reference picture [video]: The past reference picture is the reference picture that occurs 
at an earlier time than the current picture in display order. 

2.1.104 pel aspect ratio [video]: The ratio of the nominal vertical height of pel on the display to its 
nominal horizontal width. 

2.1.105 pel [video]: Picture element. 

2.1.106 picture period [video]: The reciprocal of the picture rate. 

2.1.107 picture rate [video]: The nominal rate at which pictures should be output from the decoding 
process. 

2.1.108 picture [video]: Source, coded or reconstructed image data. A source or reconstructed picture 
consists of three rectangular matrices of 8-bit numbers representing the luminance and two chrominance 
signals. The Picture layer is one of the layers in the coding syntax defined in ISO/DEC 1 1 172-2. Note that 
the term "picture" is always used in ISO/EEC 1 1 172 in preference to the terms field or frame. 

2.1.109 polyphase interbank [audio]: A set of equal bandwidth filters with special phase 
interrelationships, allowing for an efficient implementation of the filterbank. 

2.1.110 prediction [video]: The use of a predictor to provide an estimate of the pel value or data 
-element currently being decoded. 

2.1.111 predictive-coded picture; P-picture [video]: A picture that is coded using motion 
compensated prediction from the past reference picture. 

2.1.112 prediction error [video]: The difference between the actual value of a pel or data element and 
its predictor. 

2.1.113 predictor [video]: A linear combination of previously decoded pel values or data elements. 

2.1.114 presentation time-stamp; PTS [system]: A field that may be present in a packet header 
that indicates the time that a presentation unit is presented in the system target decoder. 

2.1.115 presentation unit; PU [system]: A decoded audio access unit or a decoded picture. 

2.1.116 psychoacoustic model [audio]: A mathematical model of the masking behaviour of the 
human auditory system. 

2.1.117 quantization matrix [video]: A set of sixty-four 8-bit values used by the dequantizer. 

2.1.118 quantized DCT coefficients [video]: DCT coefficients before dequantization. A variable 
length coded representation of quantized DCT coefficients is stored as part of the compressed video 
bitstream. 

2.1.119 quantizer scalefactor [video]: A data element represented in the bitstream and used by the 
decoding process to scale the dequantization. 
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2.1.120 random access: The process of beginning to read and decode the coded bitstream at an arbitrary 
point. J 

2.1.121 reference picture [video]: Reference pictures are the nearest adjacent I- or P-pictures to the 
current picture in display order. 

2.1.122 reorder buffer [video]: A buffer in the system target decoder for storage of a reconstructed I- 
picture or a reconstructed P-picture. 

qieuJSd values UZaU ° n l ** iU>]: DeCOding of coded subband san * les m order t0 ^ver the original 

2.1.124 reserved: The term "reserved" when used in the clauses defining the coded bitstream indicates 
that the value may be used in the future for ISO/IEC defined extensions. 

2.1.125 reverse playback [video]: The process of displaying the picture sequence in the reverse of 
display order. 

2.1.126 scalefactor band [audio]: A set of frequency lines in Layer III which are scaled by one 
scaieiactor. 

2.1.127 scalefactor index [audio]: A numerical code for a scalefactor. 

2.1.128 scalefactor [audio]: Factor by which a set of values is scaled before quantization. 

2.1.129 sequence header [video]: A block of data in the coded bitstream containing the coded 
representation of a number of data elements. - 

2.1.130 side information: Information in the bitstream necessary for controlling the decoder. 

2.1.131 skipped macroblock [video]: A macroblock for which no data are stored. 

^^ SllCe IVlde ° ]: A . series of macroblocks. It is one of the layers of the coding syntax defined in 
ISO/EEC 11172-2. 

2.1.133 slot [audio]: A slot is an elementary part in the bitstream. In Layer I a slot equals four bytes 
in Layers II and III one byte. . 

2.1.134 source stream: A single non-multiplexed stream of samples before compression coding. 

2.1.135 spreading function [audio]: A function that describes the frequency spread of masking. 

2.1.136 start codes [system and video]: 32-bit codes embedded in that coded bitstream that are 
unique. They are used for several purposes including identifying some of the layers in the coding syntax. 

2.1.137 STD input buffer [system]: A first-in first-out buffer at the input of the system target 
decoder for storage of compressed data from elementary streams before decoding. 

2.1.138 stereo mode [audio): Mode, where two audio channels which form a stereo pair (left and 
right) are encoded within one bitstream. The coding process is the same as for the dual channel mode. 

2.1.139 stuffing (bits); stuffing (bytes) : Code-words that may be inserted into the compressed 
bitstream that are discarded in the decoding process. Their purpose is to increase the bitrate of the stream. 

2.1.140 subband [audio]: Subdivision of the audio frequency band. 

2.1.141 subband filterbank [audio): A set of band filters covering the entire audio frequency ranee 
In ISO/IEC 1 1 172-3 the subband filterbank is a polyphase filterbank. * 
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2.1 142 subband samples [audio]: Ihe subband filteitank within the audio encoder creates a filtered 
SiSSH? repreSe ? tat,0D ° f ™ PUt aU , di ° Strcam - ™ e futered —Ita are called subband s^pST 
LTof^ s^ESs mPUt ^ thne consccutive subband are generated within 

2.1.143 syncword (audio): A 12-bit code embedded in the audio bitstream that identifies the start of a 
inline. 

2.1.144 synthesis Hlterbank (audio]: Filterbank in the decoder that reconstructs a PCM audio 
signal from subband samples. 

fen^ff^? hCadCr (system]: Th * s y stem header is a <tota structure defined in this part of 
IfcU/IEC 1 1 172 that carries information summarising the system characteristics of the ISO/IEC 1 1 1 71 
multiplexed stream. 

2.1.146 system target decoder; STD [system]: A hypothetical reference model of a decoding 
process used to describe the semantics of an ISO/IEC 11 172 multiplexed bitstream. 

2.1.147 time-stamp [system]: A term that indicates the time of an event 

2.1.148 triplet [audio]: A set of 3 consecutive subband samples from one subband. A triplet from 
each of the 32 subbands forms a granule. 

2.1.149 tonal component [audio]: A sinusoid-like component of an audio signal. 

2.1.150 variable bitrate: Operation where the bitrate varies with time during the decoding of a 
compressed bitstream. & 

2.1.151 variable length coding; VLC: A reversible procedure for coding that assigns shorter code- 
words to frequent events and longer code-words to less frequent events. 

2.1.152 video buffering verifier; VBV [video]: A hypothetical decoder that is conceptually 
connected to the output of the encoder. Its purpose is to provide a constraint on the variability of the data 
rate that an encoder or editing process may produce. 

2.1.153 video sequence [video]: A series of one or more groups of pictures. It is one of the layers of 
the coding syntax defined in ISO/IEC 1 1172-2. 

2.1.154 zig-zag scanning order [video]: A specific sequential ordering of the DCT coefficients from 
(approximately) the lowest spatial frequency to the highest 
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2.2 Symbols and abbreviations 

Hie mathematical operators used to describe this International Standard are similar to those used in the C 
programming language. However, integer division with truncation and rounding are specifically defined. 

2.2.1 Arithmetic operators 

+ Addition. 

Subtraction (as a binary operator) or negation (as a unary operator). 
++ Increment 

Decrement 
* Multiplication. 
A Power. 



/ 
// 



Integer division with truncation of the result toward zero. For example, 7/4 and-7/-4are 
truncated to 1 and -7/4 and 7/-4 are truncated to -1. 

Integer division with rounding to the nearest integer. Half-integer values are rounded away 
from zero unless otherwise specified. For example 3//2 is rounded to 2, and -3//2 is rounded 



DIV Integer division with truncation of the result towards-**. 

I I Absolute value. I x I = x when x > 0 

1x1 = 0 when x = 0 
I x I = -x when x < 0 

% Modulus operator. Defined only for positive numbers. 

Sign( ) Sign(x) =1 x > 0 
0 x = 0 
-1 x <0 

NINT ( ) Nearest integer operator. Returns the nearest integer value to the real-valued argument Half- 
integer values are rounded away from zero. 

sin Sine. 

cos Cosine. 

exp Exponential. 

V Square root. 

log i o Logarithm to base ten. 

log e Logarithm to base e. 

J °g2 Logarithm to base 2. 

2.2.2 Logical operators 

II Logical OR. 
&& Logical AND. 
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! Logical NOT. 

2.2.3 Relational operators 

> Greater than. 

>= Greater than or equal to. 

< Less than. 

<= Less than or equal to. 

= Equal to. 

U Not equal to. 

max [,...,] the maximum value in the argument list, 
min [,...,] the minimum value in the argument list. 

2.2.4 Bitwise operators 

A twos complement number representation is assumed where the bitwise operators are used. 
& AND. 
I OR. 

» Shift right with sign extension. 

« Shift left with zero fill. 

2.2.5 Assignment 

Assignment operator. 

2.2.6 Mnemonics 

The following mnemonics are defined to describe the different data types used in the coded bit-stream. 

bslbf Bit string, left bit first, where "left" is the order in which bit strings are written in 

ISO/EC 11 172. Bit strings are written as a string of Is and Os within single quote 
marks, e.g. '1000 0001'. Blanks within a bit string are for ease of reading and have no 
significance. 

ch Channel. If ch has me value 0, me left channel of a stereo signal or the first of two 

independent signals is indicated. (Audio) 

nch Number of channels; equal to 1 for single_channel mode, 2 in other modes. (Audio) 

gr Granule of 3 * 32 subband samples in audio Layer II, 18 * 32 sub-band samples in 

audio Layer in. (Audio) 

main_data The main Jata portion of the bitstream contains the scalefactors, Huffman encoded 

data, and ancillary information. (Audio) 

main_dataJ>eg The location in the bitstream of the beginning of the main_data for the frame. The 

location is equal to the ending location of the previous frame's main_data plus one bit 
It is calculated from the majn_daut_end value of the previous frame. (Audio) 

part2_length The number of main_data bits used for scalefactors. (Audio) 
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rpcbof 
sb 

sblimit 

scfsi 

switch _poinU 

switch_point_s 

uimsbf 
vldbf 

window 



Remainder polynomial coefficients, highest order fust (Audio) 
Subband. (Audio) 

The number of ihe lowest sub-band for which no bits are allocated. (Audio) 
Scalefactor selection information. (Audio) 

Number of scalefactor band (long block scalefactor band) from which point on window 
switching is used. (Audio) 

Number of scalefactor band (short block scalefactor band) from which point on window 
switching is used. (Audio) ^ uw 

Unsigned integer, most significant bit fust 

Variable length code, left bit first, where "left" refers to the order in which the VLC 
codes are written. 



Number of the actual time slot in case of block_type===2, 0 £ window <, 2. (Audio) 
The byte order of multi-byte words is most significant byte first 
2.2.7 Constants 

n 3,14159265358... 
e 2,71828182845... 



2.3 Method of describing bit stream syntax 

^ b \?^^HK ed ? y deCOd f f is described in 2A3 ' *** item m «■* Wt stream is in bold 
type. It is descnbed by its name, its length in bits, and a mnemonic for its type and order of transmission. 

Reaction caused by a decoded data element in a bit stream depends on the value of that data element and 

S. i deC °f mg ^ de$Cnbed in 2 AA ™ e m °™& ^tructs are used to express the caJSST 
when data elements are present, and are in normal type: wuuiuui* 



while (condition) { 
data_element 

• • • 

) 

do{ 

data_element 

• ♦ • 

} while ( condition ) 

if ( condition) { 
data_element 

• • • 

) 

else { 

data_element 



If the condition is true, then the group of data elements occurs next 
m the data stream. Hiis repeats until the condition is not true. 



The data element always occurs at least once. 

The data element is repeated until the condition is not true. 

If the condition is true, then the first group of data elements occurs 
next in the data stream. 



If the condition is not true, then the second group of data elements 
occurs next in the data stream. 



) 



Exhibit 18, page 19 



13 



ISO/IEC 11172-1: 1993 (E) 



© ISO/IEC 



for (exprl; expr2; expr3) { exprl is an expression specifying the initialization of the loop. Normally it 
data_elemcnt specifies the initial state of the counter. expr2 is a condition specifying a test 

• • ' made before each iteration of the loop. The loop terminates when the condition 
' 1S not *ue. expr3 is an expression that is performed at the end of each iteration 

of the loop, normally it increments a counter. 

Note that the most common usage of this construct is as follows: 

for ( i = 0; i < n; t-H-) { The group of data elements occurs n times. Conditional constructs 
data_element within the group of data elements may depend on the value of the 

• • • }oop control variable i, which is set to zero for the first occurrence, 
} incremented to one for the second occurrence, and so forth. 

As noted, the group of data elements may contain nested conditional constructs. For compactness the { 1 
may be omitted when only one data element follows. 

data_elemen t [ J data^element Q is an array of data. The number of data elements is indicated by 
the context 

data_element [n] data^element [n] is the n+lth element of an array of data. 

data_element [m][n] dat2*_element [m][n] is the m+l,n+l th element of a two-dimensional array of 



data_element [l][m][nl data_element [I][m][n] is the l+l,m+l,n+l th element of a three-dimensional 
array of data. 

data_element [m..nj is the inclusive range of bits between bit m and bit n in the data_element. 

While the syntax is expressed in procedural terms, it should not be assumed that 2.4.3 implements a 
satisfactory decoding procedure. In particular, it defines a correct and error-free input bitstream Actual 
.decoders must include a means to look for start codes in order to begin decoding correctly, and to identify 
errors, erasures or insertions while decoding. The methods to identify these situations, and the actions to be 
taken, are not standardized. 

Definition of bytealigned function 

The function bytealigned 0 returns 1 if the current position is on a byte boundary, that is the next bit in the 
bit stream is the first bit in a byte. Otherwise it returns 0. 

Definition of nextbits function 

The function nextbits 0 permits comparison of a bit string with the next bits to be decoded in the bit 
stream. 

Definition of next_start_code function 

The nexLstart_code function removes any zero bit and zero byte stuffing and locates the next start code. 



Syntax 


No. of bits 


Mnemonic 


next_stafl_codeO { 
while ( IbytealignedO ) 
zero_bit 

while ( nextbitsO != *0000 0000 0000 0000 0000 0001* ) 
zero_byte 


1 

8 


"0" 

"00000000 M 



This funcuon checks whether the current position is byte aligned. If it is not, zero stuffing bits are present 
After that any number of zero bytes may be present before the start-code. Therefore start-codes are always 
byte aligned and may be preceded by any number of zero stuffing bits. 
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2.4 Requirements 

2.4.1 Coding structure and parameters 

streams to be replayed in synchronism g WUh mfonnation allows elementary 

ISO/IEC 11172 multiplexed stream 

presentation unit for a v£*SSiS ^S^^^^^ 00 units - ™ e 
coded data for the picture. The aSSnl P ^corresponding access unit includes all the 
includes any preceding dam ^^X^St^^I^^Z^?^ 
with the group start code The accent ™,o S «- , 10 Z4 - 2 4 m IS0/IEC "172-2, starting 
defined in 2&3 to at seS ££? Slf^S * S^"* ^ 35 

S 5 SS2 'SfJ P?,^Apaeket consists ofapacketheaderfollowedby^ 
date beKlSe • Sdke jZ, ™, b ' ** 3150 identifies «* stre am *> which the packet 

ref* to P«,n time-stamps (DTS and m) L 

contiguous bytes from one elemSSJTS, ' P ' ™ C paCket ^ contains a ^able number of 

bitrate information. 8 3 32 bU start - code - ™ e Pack header is used to store timing and 

2.4.2 System target decoder 

Si™ o?;^ 
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j-th access unit 





Bi 









y k-lh presentation unit 

/ r-. \ 



M(n = Mhbyteof 
mugpto stream 



r 



i(k) 

00 



Bn 



Bo 








P„(k) 

ip 0 < k > 



— System Control 

Figure 2 « Diagram of system target decoder 

Notation 

THe following notation is used to describe the system target decoder and is pardally illustrated in figure 2. 

i, 1' are indices to bytes ^ I^e first byte has index 0. 

j is an index to access units in the elementary streams. 

k, k',k" are indices to presentation units in the elementary streams. 

n is an index to the elementary streams. 

M(i) is the i* byte in the ISO/DEC 1 1 172 multiplexed stream. 

enters the system target decoder. The value tm(0) is an arbitrary constant 

SCR(i) is the time encoded in the SCR field measured in units of the 90 kHz system clock 
where i is the byte index of the final byte of the SCR field. 

A„© is thej^ access unit in elementary stream n. Note that access units are indexed in decoding order. 

tdnO) is the decoding time, measured in seconds, in the system target decoder of the access unit in 
elementary stream n. 

Pn(k) is the k m presentation unit in elementary stream n. 

tPn(k) is the presentation time, measured in seconds, in the system target decoder of the k* presentation 
unit in elementary stream n. 

t is time measured in seconds. 

F„(t) is the fullness, measured in bytes, of the system target decoder input buffer for elementary stream n 
at time t 

Bn is the input buffer in the system target decoder for elementary stream n. 

BS n is the size of the system target decoder input buffer, measured in bytes, for elementary stream n. 

Dn is the decoder for elementary stream n. 

On is the reorder buffer for elementary stream n. 
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System Clock Frequency 

^Siss of^ScST *** f,eids der,ned in 24j ^ ™» tafonnau ° n * - 

The value of the system clock frequency is measured in hertz and shall meet the following constraints: 
90 000 - 4,5 <= system_cIock_frequency <= 90 000 + 4,5 
rate of change of system_clock_frequency with time <= 250 ♦ 10" 6 Hz/s 

The notation "system_clock_frequency" is used in several places in this part of ISO/IEC 1 1 172 to refer to 

PTC or DTS appear lead to values of time which are accurate to some integral multiple of 
(2A33/system_cIock_frequency). This is due to the 33-bit encoding of timing ^formation. 

Input to the System Target Decoder 

Data from the ISO/IEC 1 1172 multiplexed stream enters the system target decoder The 1 th bytfc M(i) 
enters at time tm(i). The time at which this byte enters the system target decoder can be recovered firm me 
input suxamby de^mg Oiemput system clock reference (SCR) fell encodedlnX pa^S. Te 
value encoded m the SCR(0 field indicates time tm(i"), where i' refers to the last byte of the SCR field, 

Specifically: • 

SCR(i*) = NINT ( system_clock_frequency * 0x1(0 ) % 2 33 
The input arrival time, tm(i), for all other bytes shall be constructed from SCRfi 1 ) and the rate at which data 
teafe fe?2 4 3 7^2442^ *"* " ^ KpKSea,£d m mux - rate fie,d "that pack's 



bn(i)= SGML 



+ 



system_clock_frequency (mux_rate * 50) 
Where: 

V is the index of the final byte of the system.clock ^reference field in the pack header, 

i is the index of any byte in the pack, including the pack header. 

SCR(i') is the time encoded in the system_clock_reference field in units of the system clock. 
mux_rate is a field defined in 2.4.3.2 and 2.4.4.2. 

After delivery of the last byte of a pack there may be a time interval during which no bytes are delivered to 
the input of the system target decoder. " 

Variable rate operation of the system target decoder is provided through the use of the mux_rate field the 
value of which may vary from pack to pack, and the fact that the data rate entering the system target decoder 
may drop to zero after the last byte of one pack arrives and before the following pack header arrives. 

Buffering 

The packet data from elementary stream n is passed to the input buffer for stream n, B n . Transfer of byte 
M(i) from the system target decoder input to B„ is instantaneous, so that byte M(i) enters the buffer for 
stream n, of size BS n , at time tm(i). 

Bytes present in the pack, system or packet headers of ISO/IEC 1 1 1 72 multiplexed stream but not part of 
the Packet dam (for example the SCR, DTS, PTS, packetjength fields, etc, see 2.4.3) are not deli Jered to 
any of the buffers, but may be used to control the system. 
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The input buffer sizes BS] through BS n are given by parameters in the syntax (see 2.4.3 and 2.4.4). 

At the decoding time, tdn(j), all the data for the access unit that has been in the input buffer longest (A n (i) ) 
is removed instantaneously. In the case of a video elementary stream, group of picture and sequence header 
data that precede the picture are removed at the same time. In the case of the first coded picture of a video 
sequence, any zero bit or byte stuffing immediately preceding the sequence header is removed at the same 
time. Note that this only applies to the first picture of a video sequence and not to additional occurences of 
a sequence header within a video sequence. As the access unit is removed from the buffer it is 
instantaneously decoded into a presentation unit. 

Decoding 

Elementary streams buffered in Bi through B n are decoded instantaneously by decoders Di through Dn and 
may be delayed in reorder buffers Oi through On before being presented to the viewer at the output of the 
system target decoder. Reorder buffers are used only in video decoding to store I-pictures and P-pictures 
while the sequence of presentation units is reordered before presentation. 

In the case of a video elementary stream, some access units may not be stored in presentation order These 
access units will need to be reordered before presentation. In particular, an I-picture or a P-picture stored 
before one or more B-pictures must be delayed in the reorder buffer, On, of the system target decoder before 
being presented. It should be delayed until the next I-picture or P-picture is decoded. While it is stored in 
the reorder buffer, the subsequent B-pictures are decoded and presented. 

If P n (k) is an I-picture or a P-picture that needs to be reordered before presentation, it is stored in On after 
being decoded and the picture previously stored in On is presented. Subsequent B-pictures are decoded and 
presented without reordering. 

The time at which a presentation unit P n (k) is presented to the viewer is tpn(k). For presentation units that 
are not reordered, tpn(k) is equal to td n (j) since the access units are decoded instantaneously For 
presentation units that are reordered tpn(k) and td n (j) differ by the time that P n (k) is delayed in the reorder 
^buffer, which is a multiple of the nominal picture period. 

Subclause 2.4.1 of ISO/EEC 11172-2 explains reodering of video pictures in greater detail. 
Presentation 

Tlie function of a decoding system is to reconstruct presentation units from compressed data and to present 
them in a synchronized sequence at the correct presentation times. Although real audio and visual 
presentation devices generally have finite and different delays and may have additional delays imposed by 
post-processing or output functions, the system target decoder models these delays as zero. 

In the system target decoder the display of a video presentation unit (a picture) occurs instantaneously at its 
presentation time, tpn(k). 

In the system target decoder the output of an audio presentation unit starts at its presentation time, tpn(k), 
when the decoder instantaneously presents the first sample. Subsequent samples in the presentation unit are 
presented in sequence at the audio sampling rate. 
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2.4.3 Specification of the system stream syntax 

The following syntax describes a stream of bytes. 
2.4.3.1 ISO/IEC 11172 Layer 



Syntax 


No. of bits 


Mnemonic 


isolll72_stream() { " " 
do{ 

packO 

} while (nextbitsO = paclestart_code) 
iso_11172_end code 


32 


bslbf 



2.4.3.2 Pack Layer 
Pack 



Syntax 


No. of bits 


Mnemonic 


packO { 

pack start code 
■OOIO 1 

system_clock_reference [32.. 30] 
marker_bit 

system_c!ock_reference [29..15] 
marker_bit 

system_clock_reference [14..0J 
marker_bit 
j marker_bit 
mux_rate 
marker_bit 


32 
4 
3 
1 

15 
1 

15 
1 
1 

22 
1 


bslbf 

bslbf 

bslbf 

bslbf 

bslbf 

bslbf 

bslbf 

bslbf 

bslbf 

uimsbf 

bslbf 


if (nextbitsO = system Jieader_start_code) 
system JieaderO 






while (nextbitsO == packeL.start_code_prefix) 
packetO 
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System header 



Syntax . - 

systemjieader 0 { 

system_hcader_start_code 

headerjength 

markerjbit 

ratejbound 

markerjbit 

audio_bound 

fixed_flag 

CSPS_f!ag 

system_audioJock_flag 
system_video_lock_flag 
markerjbit 
video_bound 
reserved_byte 
while (nextbits 0 = T) [ 
stream id 

'ir 

STD_buffer_bound_scale 
STD_buffer_size_bound 



No. of bits 


Mnemonic 


32 


bslbf 


16 


uimsbf 


* 
1 


bslbf 


22 


uimsbf 


1 


bslbf 


6 


uimsbf 


1 


bslbf 


1 


bslbf 


1 


bslbf 


1 


bslbf 


1 


bslbf 


5 


uimsbf 


8 


bslbf 


8 


uimsbf 


2 


bslbf 


1 


bslbf 


13 


uimsbf 



1 



} 



20 



Exhibit 18, page 26 



<g> ISO/lEC 



ISO/IEC 11172-1: 1993 (E) 



2.4.3.3 



Packet Layer 



Syntax 



No. of bits 



Mnemonic 



packetO ( 



packet_start_code_prefix 
stream_ld 
packetjcngth 

if (stream Jd != private_streamJZ) { 

while (nextbitsO = HH HIO 

stuffing_byte 
if(nextbitsO = '0r){ 

•or 

STD_buffer_scale 
STD_buffer_size 

if (nextbitsO = , 0010 , ) { 
•WIO* 

presentation_time_stampl32..30] 
markerjbit 

prcsentation_time_stampl29..15] 
marker.bit 

presentation_timc_stampll4..0] 
marker_bit 



) 

else if (nextbitsO 1 
'001 r 



= XX)ir){ 



else 



presentation_time_stamp[32..30] 
markcr_bit 

presentation_time_stamp[29..15] 
marker_bU 

presentation_time_stamptl4..0] 
markerjbit 

•ooor 

decoding_time_stampl32..30] 
marker_J>it 

decoding_time_stampI29..15] 
marker.bit 

decoding_time_stampI14..0] 
markcr_bH 



•oooo nir 



} 

for (i « 0; i < N; i++) { 

packet_dataj>yte 

} 



24 
8 
16 



8 

2 
1 
13 



4 
3 
1 
15 

i 

15 
1 



4 

3 
1 

15 
1 

15 
1 
4 
3 
1 

15 
1 

15 
1 



bslbf 

uimsbf 

uimsbf 



bslbf 

bslbf 
bslbf 
uimsbf 



bslbf 
bslbf 
bslbf 
bslbf 
bslbf 
bslbf 
bslbf 



bslbf 
bslbf 
bslbf 
bslbf 
bslbf 
bslbf 
bslbf 
bslbf 
bslbf 
bslbf 
bslbf 
bslbf 
bslbf 
bslbf 



bslbf 



bslbf 
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2.4.4 Semantic definition of fields in syntax 

2.4.4.1 ISO/IEC 11172 Layer 

mn In^S&ESo" V* 1 1 J 2 7 end - Code h "* bit strin « "0000 0000 0000 0000 0000 0001 
101 1 1001 (000001B9 in hexadecimal). It tenninates (he ISO/IEC 1 1 172 multiplexed stream. 

2.4.4.2 Pack Layer 
Pack 

m£mt?A?*t!* Pf k - StarCcode is ** blt strfa 8 nQm 0000 0000 0000 0000 0001 101 1 1010" 
(000001B A m hexadecimal). It identifies the beginning of a pack. 

M k "n - e ^ nCe ".T? 6 sy^-clock.reference (SCR) is a 33-bit number coded in three 
separate fields. It indicates the intended time of arrival of the last byte of the system_clock_reference field at 
ti* mputof the system target decoder. Tbe value of the SCR is measured in the number of periods of a 

2S5^2jr Dcespet ^ ed m Z42 - Using them * ation of 2AX ^ ■ » 

SCRO) = NINT (system_clock_frequency * (tm(i)) ) % 2 33 
for i such that M(i) is the last byte of the coded system_clock_reference field, 
marker.bit - A marker_bit is a one bit field that has the value " 1 ". 

™^!^~™ s * f f 05 ^ tate 8f specifying the rate at which the system target decoder receives the 
ISO/IEC 11 172 multiplexed stream dunng the pack in which it is included. The value of mux rate is 
measured in units of 50 bytes/s, rounded upwards. Tbe value zero is foibidden. The value represented in 
mux_rate is used to define the time of arrival of bytes at the input to the system target decoder in 2 4 2 
TOe value encoded in the muxjate field may vary from pack to pack in an ISO/IEC 1 1 172 multiplexed" 
stream. v 

System Header 

^™^?rfm s y stem - hea der-S«art-Code is the bit string "0000 0000 0000 0000 

0000 0001 1011 1011- (OOOOOIBB in hexadecimal). It identifies the beginning of a system header. 

er T ,en , gtb "J T h f j h ^ derJen S ,h sha11 te ^ to the number of bytes in the system header following 
U»header_length field. Note that future extensions of this part of ISO/IEC 1 1 172 may extend the system 

ratebound - The rate_bound is an integer value greater than or equal to the maximum value of the 
mux.rate field coded m any pack of the ISO/DEC 1 1172 multiplexed stream. It may be used by a decoder to 
assess whether it is capable of decoding the entire stream. °y a oecooer io 

audlo.bound - The audio_bound is an integer, in the inclusive range from 0 to 32. greater than or equal 
to to > maximum number of ISO/IEC 11172 audio streams in the ISO/IEC 11172 multiplexed stream for 
which me (lecodmg processes are simultaneously active. For the purpose of this clause, the decoding 
process of an MPEG audio stream is active, if the STD buffer is not empty, or if the decoded access unit is 
being presented in the STD model. 

!K? J ? g ~ 1,16 f ?™ is a one " bit nag - If its value « s« to "1" fixed bitrate operation is indicated. 
r iL V ^" e '^ sett0 0 ^able bitrate operation is indicated. During fixed bitrate operation, the value 
encoded in all system_cIock_reference fields in the multiplexed ISO/IEC 1 1172 stream shall adhere to the 
following linear equation: 

SCR(i) = NINT (cl * i + c2) % 2 33 

where 

cl is a real-valued constant valid for all i; 
c2 is a real-valued constant valid for all i; 

i is the index in the ISO/EC 1 1 172 multiplexed stream of the final byte of any 
system_clock_reference field in the stream. 
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?E?7 lb $ "I ^ CSPS T fl «8 is a one -»>it nag. If its value is set to "1" the ISO/IEC 11172 
multiplexed stream meets the constraints defined in 2.4.6. IL 

!^^ aUd,0 r , ? k - f1a8 , ~, ™ e s ystem_audioJock_flag is a one-bit flag indicating that there is a 
specified, constant rational relationship between the audio sanpling rate and me system clock L 

specuieo in i^u/uic 1 1172-3. The system audio lock flap mav nniu Kp. c^t i rt »i« ~n . 

S^^J 1 ^ *? ? ^ ""l** SCASR > * constant Lid equal u> the value 
indicated in the following table at the nominal sampling rate indicated in the audio stream 

SCASR = system j:lockJirequencv 



audio sample rate in the STD 



The notation — denotes real division* 



Nominal audio sampling frequency (kHz) 1 


32 


[44, 


1 


148 


Ratio SCASR 


90 000 


90 


000 


190 000 


1 , ; 


32 000 


44 


100 


148 000 



systenuvideojockjlag - The system_video_lock_flag is a one-bit flag indicating that there is a 
specified, consent rational relationship between the video pfcture rate and th system 2*fequS£i the 
SSm?i£Sk 1 St" ^ ys tem_cIocV_f req u«K y and the" 
spwafied m ISO/IEC 1 1 172-2. The system_video_lock_flag may only be set to " 1" if for all Diesentation 
SZZ JL?? e,ementiu y „ slreams , m KO/IEC 1 1 172 multiplexed stream, the ratio of preSentaU0D 
Sffi^^tT* , totheactualvideo P icture rate, SCPR, is constant and ^ual to the value indicated 
in the following table at the nominal picture rate indicated in the video stream. 

SCPR _ system_clock_frequencv 
picture rate in the STD 



Nominal picture 1 23,976 
rate (Hz) || 


24 


25 


29,97 


30 


50 


59,94 


60 


Katio SCPR 1 15 015 


3 750 


3600 


3 003 


3000 


1 800 


3 003 
2 


1500 



llZtSZX?* S , CPR ^ acL ^ mnai P icture dif fers slighdy from the nominal rate in 
cases where the nominal rate is 23,976, 29,97, or 59,94 pictures per second. 

vldeo.bound - The video_bound is an integer, in the inclusive range from 0 to 16, greater than or equal 

ShthZZZ nUmber ° f IS0/ ^ C 11172 Wde0 strcams in «* IS °/ IEC 11172 multiplexed suffer 
f ^^H?ff^, s simultaneously active. For the purpose of this clause, the decoding 

P ^°-^E?^ 11172 ? idMs ^ 

access unit is being presented in the STD model, or if the reorder buffer is not empty. 

KX^^ for futtm5 use by IS0/IEC UntiI othenvise specified b * IS0/IEC 

l™*^ "J he i trea f lJd J o d J^ ates me lype md number of stream to which the following 
STD_buffer_bound_scale and STD_buffer_size_bound fields refer. 8 

If streamed equals -1011 1000" the STD_buffer_bound_scale and STD.buffer size bound fields following 
the streamed refer to all audio streams in the ISO/IEC 1 1 172 multiplexed stream. ~ l0U ™m& 

JL^-^f " 1011 1001" the STD_buffer_bound_scale and STD_buffer_size bound fields following 
the streamed refer to all video streams in the ISO/IEC 1 1 172 multiplexed stream. " IOU °w">g 

2Jfi?T Jd U ? CS ° n r my - 0ther V3lue itShaM beabyte value *"» « equal to "1011 1100" and 

10 ^tream type and number according to table 1 . This .able is usedX, 
to identify the stream type and number indicated by the streamjd defined in 2.4.4.3. 
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Table 1 — stream id table 



stream id 


| stream type 


1011 1100 


reserved stream 


1011 1101 


privaie_stream_l 1 


1011 1110 


padding stream 


1011 1111 


private_jstream_2 


llOx xxxx 


ISO/IEC 11172-3 audio stream - 




number xxxxx 


1110 xxxx 


ISO/IEC 1 1 172-2 video stream - 


1111 xxxx 


number xxxx 


reserved data stream - number xxxx 


Hie notation x means that the values 0 and 1 are both 
permitted and result in the same stream type. The stream 
number is given by the values taken by the x's. 



Each elementary stream present in the ISO/IEC 1 1 172 multiplexed stream shall have its 
STOJ>ufferjx)und_scaIe and STDJ)uffer_sizej>ound specified exactly once by this mechanism in each 
system header. 

STDJjufTer J>ound_scale - The STO J>ufferJ>ound_scale is a one-bit field that indicates the scaling 
factor used to interpret the subsequent STD_buffer^size.bound field. If the preceding stream Jd indicates an 
audio stream, STD_buffer.bound^scale shall have the value "0". If the preceding streamed indicates a 

STD_bufferJmnd_scale shall have the value "1". For all other stream types, the value of 
the STOJ)uffer_bound__scale may be either T or "0". 

STD3uffer„si2e„bound ~ The STOJ>uffer_sizeJx>und is a 13 bit unsigned integer defining a value 
greater than or equal to the maximum system target decoder input buffer size, BS n , over all packets for 
stream n in the ISO/DEC 1 1 172 stream. If STO J)ufferJ>ound_scaIe has the value "0 H then 
STD buffer j>izejx>und measures the buffer size bound in units of 128 bytes. IfSTD buffer bound scale 
hasfce value 1" then STD_buffer_size_bound measures the buffer size bound in unto of 1024 bytel 

if (STD J)uffer_bound_scaIe = 0) 

BS n <= STDJ>uffer_size_bound * 128; 

else 

BS n <= STD_buffer_sizeJ>ound * 1 024; 
2.4.4.3 Packet Layer 

packet_start_code^refix - The packeLstart_code_prefix is a 24-bit code. Together with the 
streamed that follows, it constitutes a packet start code that identifies the beginning of a packet The 
packet„starLcodej)refix is the bit string "0000 0000000000000^ (000001 in hexadecimal). 

stream Jd - The stream Jd specifies the type and number of the elementary stream as defined by the 
streamed table, table 1 in 2.4.4.2. Each elementary stream in an ISO/IEC 11172 multiplexed stream shall 
have a unique streamjd. 

^eUengSfield 1110 paCkeUenfith W* 6 ^ number of b y*s remaining in the packet after the 

stuffingj>yte - This is a fixed 8-bit value equal to "1 111 1 111" that can be inserted by the encoder for 
example to meet the requirements of the digital storage medium. It is discarded by the decoder. No more 
than sixteen stuffing bytes shall be present in one packet header. 

STD J>uffer_sca!e The STDJ>uffer_scale is a one-bit field that indicates the scaling factor used to 
interpret the subsequent STD_buffer_size field. If the preceding streamjd indicates an audio stream. 
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STDJ>uffer_scaIe shall have the value "0". If the preceding streamjd indicates a video stream 
STDJ>iiffer_scale shall have the value T. For all other stream types, the value may be either "1" or "0 M . 

? T P- b J?? er r size " ^ STD^buffer^size is a 13-bit unsigned integer defining the size of the input 
buffer, BSn, in the system target decoder. If STD J>ulTer__scale has the value M 0 M then the STD buffer size 
measures the buffer size in units of 128 bytes. If STDJ>ufferjscale has the value "1" then the' 
STD__buffer_size measures the buffer size in units of 1024 bytes. Thus: 

if (STDJ>ufTer_scaIe = 0) 

BS n = STOJ>uffer_size * 128; 

else 

BS n = STDJmfferjsize * 1024; 

Tne encoded value of the STD buffer size takes effect immediately when the STD buffer size field is 
received by the MPEG system target decoder. ~ 

presentationJime_stamp - The presentation_time_stamp (PTS) is a 33-bit number coded in three 
separate fields. It indicates the intended time of presentation in the system target decoder of the presentation 
unit that corresponds to the first access unit that commences in the packet The value of PTS is measured 
in the number of periods of a 90kHz system dock with a tolerance specified in 2.4.2. Using the notation 
of 2.4.2 the value encoded in the presentation Jime_stamp is: 

PTS = NINT (system_clock w frequency * (tpn(k) ) ) % 2 33 

where 

tpn(k) is the presentation time of presentation unit P n (k). 

P n (k) is the presentation unit corresponding to the first access unit that commences in the packet 
data An access unit commences in the packet if the first byte of a video picture start code or the 
first byte of the synchronization word of an audio frame (see ISO/IEC 1 1 172-2 and ISO/IEC 
1 1 172-3) is present in the packet data. 

If there is filtering in audio, it is assumed by the system model that filtering introduces no delay, hence the 
sample referred to by PTS at encoding is the same sample referred to by PTS at decoding. 

decoding_time_stamp The decoding_time_stamp (DTS) is a 33-bit number coded in three separate 
fields. It indicates the intended time of decoding in the system target decoder of the first access unit that 
commences in the packet The value of DTS is measured in the number of periods of a 90 kHz system 
clock with a tolerance specified in 2.4.2. Using the notation of 2.4.2 the value encoded in the 
decodmg_time_stamp is: 



where 



DTS = NINT (system_clock frequency * (td n (j) ) ) % 2 33 
tdn(j) is the decoding time of access unit A n (j). 

A n 0) is the first access unit that commences in the packet data. An access unit commences in the 
packet if the first byte of a video picture start code or the first byte of the synchronization word of 
an audio frame (see ISO/IEC 11172-2and ISO/IEC 11172-3) is present in the packet data. 

packet.data J>yte ~ packet_dataj>ytes shall be contiguous bytes of data from the elementary stream 
indicated by the packet's streamjd. The byte-order of the elementary stream shall be preserved. The 
number of packet_dataj)ytes, N, may be calculated from the packetjength field. N is equal to the value 
indicated in the packetjength minus the number of bytes between the last byte of the packetjength field 
and the first packet_dataj>yte. 

In the case of a video stream, packet^datOy tes are coded video data as defined in ISO/IEC 11172-2. In the 
case of an audio stream, packeedatOytes are coded audio data as defined in ISO/IEC 1 1 172-3. In the case 
of a padding stream, packeLdata J>ytes consist of padding bytes. Each padding byte is a fixed bit-string 
with the value "1111 1111". In the case of a private stream (type 1 or type 2), packeudataj>ytes are user 
definable and will not be defined by ISO/IEC in the future. The contents of packeLdata_bytes in reserved 
streams may be specified in the future by ISO/IEC. 
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2.4.5 Restrictions on the multiplexed stream semantics 

2.4.5.1 Buffer management 

Hie ISO/IEC 1 1 172 multiplexed stream, M(i) in the notation described in 2.4.2, shall be constructed and 
tm(i) shall be chosen so that the input buffers of size BS] through BS n neither overflow nor underflow in 
the system target decoder. That is: 

Oo=F n (t)<=BS n foralltandn 

and F n (t) = 0 instantaneously before fc=lm(0). 

F n (t) is the instantaneous fullness of STD buffer B n . 

For all ISO/IEC 1 1 172 multiplexed streams, the delay caused by system target decoder input buffering shall 
be less than or equal to one second. The input buffering delay is the difference in time between a byte 
entering the input buffer and when it is decoded. 

Specifically: 

td n (j) - tm(i) <= 1 

For all bytes M(i) contained in access unit j. 

2.4.5.2 Frequency of coding the system_ciock_reference 

The ISO/IEC 11 172 multiplexed stream, M(i), shall be constructed so that the time interval between the 
final bytes of system_clock_reference fields in successive packs shall be less than or equal to 0,7 s. Thus: 

ltm(i)-t m (i')l<=0,7s 

for all i and V where M(i) and M(i') are the last bytes of consecutive system_clock_reference fields. 

2.4.5.3 Frequency of presentatlon_tlme_stamp coding 

The ISO/IEC 11172 multiplexed stream M(i) shall be constructed so that the maximum difference between 
coded presentauon_ume_stamps is 0,7 s. Thus 

hPnflO - tp n (k")l <= 0,7 s 

for all n and all k, k M satisfying: 

1) P n (k) and P n (k") are presentation units for which presentation jime_stamps are coded; 

2) k and k" are chosen so that there is no presentation unit, ? n fW) with a coded 
presentationjime_stamp and with k < k' < k M ; 

3) no discontinuity (as defined in 2.4.5.4) exists in elementary stream n between P n (k) and P n (k"). 

2.4.5.4 Conditional coding of time stamps 

For each elementary stream of an ISO/IEC 1 1 172 stream, the presenmtion jime__stamp shall be encoded in 
the packet in which the first access unit of that elementary stream commences. For the purposes of this 
clause >a video accessunit commences in a packet if the first byte of the picture start code ^present in the 
packet data (see ISO/IEC 1 1 172-2). An audio access unit commences in a packet if bSeo" tte 
synchronization word of the audio frame is present in the packet data (see ISO/IEC 1 1 172-3). 

A discontinuity exists at the start of presentation unit P n (k) in an elementary stream n if the presentation 
time tpn(k) is greater than the largest value permissible given the specified tolerance on the 
system.clockjrequency. If a discontinuity exists in any elementary audio or video stream in the ISO/IEC 
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JisSmSS 101 the " * presentation - time - stam P shaU te encoded referring to the first access unit after each 

Presentation_dme_stamps may be present in any packet header with the following exception. If no access 
7 f cket . da,a ' *« presentation_time_stamp shall not be present in die packet hSef 
If a presenlationjwie.stamp is present in a packet header it shall refer to the presentation unit 
corresponding to the first access unit that commences in the packet data. 

A decoding_time_stamp shall appear in a packet header if and only if the following two conditions are met: 

1) A presentauon_time_stamp is present in the packet header. 

2) The decoding time differs from the presentation time. 

2.4.5.5 Frequency of coding STD_buffer_slze in packet headers 

The STp_buffer_scale and 1 STD_b u ffer_size fields shall occur in the first packet of each elementary stream 
and again whenever the value changes. They may also occur in any other packet. ^ 

2.4.5.6 Coding of system header 

The system header may be present in any pack, immediately following the pack header. The system header 
shall be present in the fust pack of an ISO/BBC 1 1 172 multiplexed stream Theses en3 Mb TaJ d? 
system headers in the ISO/IEC 1 1 172 multiplexed stream shall be identical. 

2.4.6 Constrained system parameter stream 

1 1 172 mu,U P ,e ?ed stream is a "constrained system parameters stream" (CSPS) if it conforms 

£i?rS ^ f ?c^ ,aUSe - lSOfiEC 1 1 172 flexed streams are not limited to the iSunT 

SSf: » T* e ?^ A CSPS K may J 6 f Cntifi ^ by meanS 0f * e CSPS -^ der '"ed in the stream hSder 
(see 2.4.3.2). The CSPS is a subset of all possible ISO/IEC 11 172 multiplexed streams. 

Packet rate 

viZSSS'Ji* max ™" ni K rate 1 at whicn f» ck ets shall arrive at the input to the system target decoder is 
2?h£E iFJ^l J r f col ^ CnCOdcd in 1,16 mux - rale f,eld * »ess than or ial to 5 000 000 bits/s. 
m^Ste fidd ^ ^ S b0U " ded by 3 Hnear rclalion 10 01,5 va "" e e^ 0 ** m me 

Specifically, for all packs p in the ISO/IEC 1 1 172 multiplexed stream, 
NP <= (tm(i') - tm (i) ) * 300 * max 
R max = 8 * 50 * ratejxmnd bits/s 

NP is the number of packet_start_code_prefixes and systemjieader start_codes between 
adjacent pack_start_codes or between the last pack_start_ code and the 
iso_11172_end_code. 

tm(i) is the time, measured in seconds, encoded in the system_clock_reference of pack p. 

tm(i') is the time, measured in seconds, encoded in the system clock_reference for pack p+1 
immediately following pack p, or in the case of the final pack in the ISO/IEC 1 1 172 ' 
multiplexed stream, the time of arrival of the last byte of the iso_l 1 172_end_code. 

System target decoder buffer size 

In die case of a CSPS the maximum size of each input buffer in the system target decoder is bounded. 
Different bounds apply for video elementary streams and audio elementary streams. 

In the case of a video elementary stream in a CSPS the following applies: 



where 
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In 2.4.3.2 of ISO/IEC 11172-2 tte horizontal piaure size, horizontal size, and the vertical picture 
s.ze, verucaLsize. are defined. If the values encoded in horizontal.^ and vertical Sze S £T 

11 lS Uien SiZC SPeCm&i f ° r "* ronst ^J»««neters,flag in 2.4.3.2 of ISO/IEC 

BS n <= 46 * 1 024 bytes. 
For all other video elementary streams in a CSPS, 

where Rvmax is the greatest value of video bit_rate specified or used in the elementary video 
stream; reference subclause 2.4.3.2 of ISO/IEC 1 1 172-2. 

In the case of an audio elementary stream in a CSPS the following applies: 

BS n <= 4 096 bytes. 
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Annex A 

(informative) 

Description of the system coding layer 



A.1 Overview 



on d» data sum, » sgjpon the neceOT^SSST E?^ Ti t m am,se ™"i«<»»«iiii s 
of decoded hu-ormadoo. te oooskSS JSSSSJ?!? "ctade d« ayndumtal prcscaudoo 
SM-op and ranctom acU art ZKSaSoT^ — •»«« of buffer for coded data. 

™dec^«Sl^SSr«S^ Mtt '^-^ 

stamps (PTS) and decoding time-stan.™ cT^-til ,~ g tune-stamps: presentation time- 

System clock ref^^s™ £m ^ . ? ■ for synchronization of audio and video 

oyiiem ciock reterence (J>CR) fields arc used in con unction with PTS an H rrrc b«m„ r / . - 

and buffer management. The use of a common Z^hTw^ZZu^^T^ 
measurement of the timing of coded data (<zcv\ ™h .wTv • ume-ciocic {b lL) f to unify the 

DTS fields), ensuros con^^SnSon^d S^/emenT ° f *" ^ ^ ^ 

££2S^^ 11172mu,uplexed stream, the 

of the system is written^S 2 rSSS^SZ °/, eIement ^ streams - ™e specification 

assssSr ^-^^^ra^s^f- 

A. 2 Encoder operations 
A. 2.1 Degrees of freedom 

SS^t^^Jir-!r l " ^ 3 Wkle ,atitude in ^""8 ^ "u'tiplexed bitstream. MulUple 
SteZT^?J^£«f*; Padding datacanbe combined in a practical way into a P 

me ou^SudlZ^u?^ st ^"« of d'fferent types are provided. One type is completely private and 
mLSSnS J ^tt? Inten,auonal Standard to support synchronization and buffer 

Sed Up to 3?Sofe m S^T ?* St ^c^ ded ' » UnU,nited number of sub-streams nSy be 
simttyS^ 

elementarv stream To foriS^T P model decoder ^ be specified individually for each 
elemenfcuy streams to start decoding (for example video sequence header andT^S^fsSlEC 
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The encoder has the option of following a set of specific constraints and setting the Constrained System 
Parameter Stream (CSPS) Hag, the system.audiojock flag, or the system_video_lock flag. These optional 
flags may be set independently of one another. The Constrained System Parameters are a defined sub-set of 
all possible system layer parameters and encoding options. Their purpose is to define a restricted set of 
parameters and options that can be decoded by economical decoders while being broad enough in application 
to gam widespread use. The system_audioJock_flag indicates that all the audio streams have an exact 
relationship to the system clock frequency. The system_video_lock_flag indicates that all the video streams 
nave an exact relationship to the system clock frequency. 

A. 2. 2 Synchronization 

This part of ISO/DEC 1 1 172 provides for end-ttxnd synchronization of the complete encoding and decodine 
process/This function is provided through the use of time-stamps, particularly Presentation Tune-stamps 
(PTS). This end-to-end synchronization is illustrated in figure A.1 which includes a prototypical encoderand 
a prototypical decoder. While these prototypical encoding and decoding systems are not normative thev 
illustrate the functions expected of real systems. ' * 
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Figure A.l « Prototype encoder and decoder 

In the prototypical encoding system, there is a single system time-clock (STC) which is available to the 
audio aid video encoders. Audio samples entering the audio encoder are organized into audio presentation 
units (PU). Some, but not necessarily all, of the audio PUs have PTS values associated with them, which 
are samples of the STC at the time the first sample of the PU is input to the encoder. Likewise video 
pichires met the video encoder, and the STC values at the times that this occurs are used to create video 
PTS fields. SCR values represent the time when the last byte of the SCR field leaves the encoder. 

This part of ISO/IEC 11172 specifies the encoder and decoder functions in terms of a reference decoder 
model knawn,as the system target decoder (STD). In this model, video pictures and audio presentation units 
are presented to the user instantaneously. Actual decoders will generally introduce post-proS«and 
presentauon delays. These decoding delays should not be compensated by real encoders Real encoders 
must generate biUtreams that play correctly on the idealised STD. Doing this may involve, for instance, 
choosing the value of the PTS at the time corresponding to the middle of a raster-scanned picture. Such an 
offset is acceptable providing that it is constant, does not introduce jitter into the sequence of PTS values, 
and the constraints on the bitstream buffering are respected. The delays mat occur in any specific real 
decoder must'be compensated in that decoder, not the encoder. 

SCR and PTS fields, and DTS where required, must be inserted by the encoder at intervals not exceeding 0 7 
s as measured by the values contained in the fields. The time interval refers to coded data time for SCR ' 
fields, and presentation time for PTS and DTS fields. These fields need not be periodic, and they may be 
encoded more frequendy than the minimum time specified. 

Because clock frequencies generally deviate from their nominal values, the use of independent clocks for the 
generation of PTS, DTS and SCR fields would result in synchronization or buffer management problem^ 
Therefore all the PTS, DTS and SCR fields in the multiplexed stream must be samples of the same STC or 
have valuesithat are equivalent to those which would have been obtained from a single clock It is not 
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pennissible, for example, to use independent clocks to produce the PTS and SCR fields in the various 
SSfmSSSS"" * SPCdfied Vi3,hedefiniti0n 0f ^"^-C'ockW^-m^a^ 

While there is a specification for the frequency tolerance of the "system.clock frequency" function used to 
create the ume-stemps. there are no explicit specifications in ISO/EC 1 1 172-2 SlSOflEC 11 172 -3fo? 
die accuracy of the picture rate, audio sample rate, or bitrate, nor for the jitter of these rarSreters Ttus 
issue will be addressed in the compliance testing specification, parameters, mis 

In practice the picture rate and audio sample rate will not exactly match the nominal rate unless thev are 
specificaUy locked to the STC. There is no requirement that these rates should be locked ^the 
system_clock_Crequency. However, if either or both rates are locked and haTa^Sct relationship to the 
system_ctock_frequency the encoder may record this by setting the system audio lock SS5EE 

*h \ T mp ,Cd 1,16 PTS f,e,ds 5,10,11(1 close, y *e nomtaal rates indSinSe 
elementary data streams m order to avoid problems in decoders. m 

A. 2. 3 Multiplexing 

SiSf.^^^lTT "t keptdiStinCtby ** use of packets, each having a packet start code that 
oSf^S ^ never contains data from more than one element stream and byte 

ordering is preserved. Thus, after removing the packet headers, packet data from all packets with a common 
stream identifier are concatenated to recover a single elementary stream. 

5S!fl» de , i,1 1 h0W «he multiplex is constructed (the size of packets and the relative placement of 
packets! roir .different steams). The multiplex is constrained primarily by the STD model including die 

J?,h™:!! 10rt . PaCke,S less STO bufferin 8 but more s y stem coding overhead than large packets 
SS^SSSS V s ' r UCh f f ? 31180 pacte(s sectors on specific storage media, maj 

ROM^c^se A* discussion of these factors is given for the particular case of CD- 

^SSiS^S^, conj . unction witD ^ enco<, i n 8 of elementary streams or the operations 

TLqS r If ^P 1 ?*"!* ■ comWn «J with coding then the system is free to use tbefull range 
of the STD buffer. If multiplexing is independent from the coding, the elementary encoders must allow 
sufficientspaamtheSTDbufferstoallowformultiplexing. Inihecaseof Cr>ROM seS Sed 
rS e nTh 8 ' a ^ dro ° m0f ^* 1024 bytes is generally sufficient Tbis is why the buffering St is 40 
Sirs str^SS ParametCre ^ IS0/EC 11 172 " 2) ^ 46 in ^ «« A » d S * stem 

Coding for use with a bursty DSM or channel in general requires additional buffering in the STD model 
beyond that required with a constant-latency DSM or channel. The additional buffering required may be 
reduced through the careful use of multiplexing and the mux_rate field. The STD uses a byte arrival 
schedule specified by the SCR and mux.rate fields. In some cases the STD byte airival schedule can be 
m|de to duplicate the actual delivery schedule of the bursty DSM or channel, permitting optimization of 



Exhibit 18, page 37 



31 



ISO/IEC 11172-1: 1993(E) 



© ISO/IEC 



Bytes 




Time 

STD data input arrival schedule 
with mux_rate > average rate between SCR's 

Figure A.2 - STD data input arrival schedule 
A.2.4 Encoder constraints caused by decoder buffering 

leaves each butter in Kerns of SmSS?™ vnvv! times when ^cfs byte of coded data enters and 
art.nl schedule, wblcS^edTSrdSS, A^^T' J* ^ *1«i0«lb,.l»» 

to, 4 096 bytea. For video tlJILS iftS htn^d'T.''"^ n™" "'"""IS" 
and me video bhmm n rr..,i,_ .r,"" . . ~ B " 111 »* STD depettda on both the ptaure sine 

46 * 1 024 ; R Y 
. 1 856 000 bytes - 
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A. 2. 5 Stream characterization 

The System Header is a special packet that contains no elementary stream data. Instead it indicates decoding 
requirements for each of the elementary streams. It indicates a number of limits that apply to the entire 
ISO/IEC 1 1 172 multiplexed stream, such as data rate, the number of audio and video streams, and the STD 
buffer size limits for die individual elementary streams. A decoding system may use these limits to 
establish its abUity to play the stream. 

The system header contains a flag indicating whether or not the data stream is encoded for constant rate 
delivery to the STD. If the data rate averaged over the time intervals between SCRs is constant throughout 
the stream and, when rounded upwards in units of 50 bytes/s, is equal to the value in the mux_rate field, the 
constant rate flag may be set If the mux_rate fields indicate a rate higher than this, data is delivered to the 
STD in bursts at the rates indicated by die mux_rate fields. The mux_rate field will never be lower than 
that implied by the SCR fields. 

The system header must be in the first pack of die ISO/EEC 1 1172 multiplexed stream. It may be repeated 
within the stream as often as necessary. In broadcast applications this may be desirable. 

Real-time encoding systems must calculate suitable limits for the values in the header before starting to 
encode. Non-real-time encoders may make two passes over the data to find suitable values. 



A. 2. 6 Padding stream 

A padding stream is provided. It may be used to maintain a constant total data rate, to achieve sector 
alignment, or to prevent buffer underflow. As the padding stream is not associated with decoding and 
presentation, it has neither a buffer in the STD model nor PTS or DTS fields. 

Stuffing of up to 16 bytes is allowed within each data packet This can be used for purposes similar to that 
of the padding stream and is well suited to providing word (16-bit) or long word (32-bit) alignment in 
applications where 8-bit alignment is not sufficient Use of stuffing bytes is the only available method of 
padding when the number of bytes required for stuffing is less than the minimum size of the padding packet, 
which is equal to the size of the stream header. 



A. 2. 7 Insertion of private data 

Two private stream types, private_stream_l and pri vate_stream_2, are provided for applications not defined 
in ISO/IEC 11172. Private_stream_l follows die same syntax as audio and video streams. It may contain 
stuffing bytes, a buffer size field, and PTS and DTS fields. The use of these fields is not specified in 
ISO/IEC 11172. Private_stream_2 is similar except that no syntax is specified for stuffing bytes, buffer 
sizes, PTS or DTS fields. 

Although only two private stream identifiers are provided, private streams may be designed to include 
branching fields to support an unlimited number of private sub-streams. This mechanism is not defined in 
ISO/IEC 11172. 

A. 3 Decoder operations 

Figures A.3 and A.4 show two different models of an implementation of a decoding system that are used in 
me following clauses to illustrate the operation of the system. Both models represent possible 
implementations. 
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A. 3.1 Decoder synchronization 
A. 3.1 .1 Time-Stamps 

The PTS, DTS and SCR fields are the basis for synchronization in decoders. Decoders parse the data stream 
and extract the PTS or DTS fields contained in the packet coding layer together with the relevant coded data. 
The PTS and DTS fields are associated with the first access unit (AU) that commences in a packet 
containing a PTS and/or DTS field. Picture start codes and audio syncwords are not necessarily located at 
the start of packets, and there may be more than one AU commencing in a packet 

PTS and DTS fields are not necessarily encoded for each picture or audio PU. They are required to occur 
with intervals not exceeding 0,7 s. This bound allows the construction of a control loop using the PTS 
values which has guaranteed stability with a known bandwidth. For those PUs for which PTS is not 
encoded, the decoder can approximate the correct value as the sum of the most recent PTS and an increment. 
The increment is the nominal number of system_cIock_frequency cycles per PU times the number of PUs 
since the last PTS. 

DTSs specify the time at which all the bytes of an access unit are removed from the buffer of an elementary 
stream decoder in the STD model. The STD model assumes instantaneous decoding of access units. In 
audio streams, and for B-pictures in video streams, the decoding time is the same as the presentation time 
and so only the PTSs are encoded; DTS values are implied. In video streams, for I-pictures and P-pictures 
the DTS values are nominally equal to the PTS value minus the number of picture periods of video 
reordering delay multiplied by the picture period, in units of the 90kHz STC. The DTS and PTS need not 
be encoded for every access unit Intervening values may be calculated from known DTS and PTS values 
and the rate of PUs for each stream. 

Similarly, SCR values, which measure the time of events in the coded data stream, are required to occur 
with intervals not exceeding 0,7 s. Again, this allows construction of a controller using SCR values with 
a guaranteed stability. 

A.3.1.2 Clock relationships 

A decoding system, including all of the synchronized decoders and the source of the coded data, must have 
exactly one independent time-master. This fact is a natural result of the requirement to avoid overflow and 
underflow in finite size buffers, while maintaining synchronization of the presentation of data. All other 
synchronized entities must slave the timing of their operation to the time-master. If a decoder attempts to 
have more than one simultaneous time-master it may experience problems with buffer management or 
synchronization. 

A decoder system has complete freedom in choosing which entity is the time-master. Typically these 
entities include the video decoder, the audio decoder, a separate STC, or the data source. Whichever entity is 
the time-master must communicate to the others the correct value of the STC. A time slave will typically 
maintain a local STC which is incremented nominally at 90 kHz between updates or corrections. In this 
way each entity has a continuously updated value of the STC which is nominally correct and which it uses 
to compare with the time-stamps. 

Two examples are presented to illustrate different approaches to designing a decoder. One uses the audio 
decoder's clock as the time-master, and the other relies on the DSM clock as the time-master. 

A.3.1.3 Example: audio as time-master 

In this first example, the audio decoder is the time-master in a decoding system. Its operation is described 
here and illustrated in figure A.3. 

The system time clock (STC) is typically initialized to be equal to the value encoded in the first SCR field 
when that field enters the decoder's buffer. Thereafter the audio decoder controls the STC. As the audio 
decoder decodes audio AUs and presents audio PUs, it finds PTS fields associated with some of the audio 
PUs. As the beginning of each PU is output to the user, the associated PTS field contains the correct value 
of the decoder's STC in an idealized decoder following the STD model. The audio decoder may use this 
value to update the STC immediately, or to control the STC values via a control loop. 

The other decoders then use this STC to determine the correct time to present their decoded data, at the times 
when their PTS fields are equal to the current value of the STC. 
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Figure A3 Example of decoding system - audio time-master 

Note that the data source (DSM or channel) must provide data to the decoder on a schedule determined by the 
SCR and mux_rate field values and the decoder's STC. This time relationship is necessary in order to 
manage the decoder buffers. Buffer management is described further in A.3.3. 

The DSM control mechanism obtains data from the DSM at a rate at least equal to that specified in the 
mux_rate field for each pack until the next SCR field is received. It is not necessary to obtain more data 
from the DSM until the STC value equals the most recendy received SCR value. If more data is read, more 
buffering will be requited. 

A.3.1.4 Example: DSM as time-master 

In this second example, illustrated in figure A.4, the DSM is the time-master, and the audio and video 
decoders are implemented as separate decoder subsystems, each receiving the complete multiplexed data 
stream and extracting and using only that portion of the stream needed. 
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Figure A.4 Example of decoding system - DSM time-master 

Each decoder receives and parses die complete multiplexed data stream and extracts the system layer 
information and the coded data needed by that decoder. Synchronization is implemented by the individual 
decoders slaving their timing to the DSM. The DSM timing is indicated by the SCR fields, which contain 
the expected value of the decoder's STC at the time that the last byte of the SCR is received by the decoder. 
Each decoder has a separate STC that is initialized to the first value of SCR received and increments at a 
nominal 90 kHz rate. Hie correct timing of the STC is maintained by ensuring that the STC is equal to 
the SCR values at the time that the SCRs are received. The STC may be maintained either by updating the 
STC with the value of the SCRs or via a control loop, using the SCR values as reference inputs. 
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A. 3. 2 Decoder start-up synchronization 
A.3.2.1 Finding start codes at random access 

The syntax specified in this part of ISO/EC 11172 fully specifies the location of the system start codes If 
decoding of the stream begins with the first byte, all start codes can be found without uncertainty After 
random access to the stream, or in broadcast applications, where decoding may begin at an arbitrary byte 
the problem of finding start codes must be solved. 

The «hirty-two bit pack and packet start codes are constructed so that they cannot occur in video data, which 
is expected to be the largest portion of the data in most applications. Therefore parsing can start after a 
random access with a low probability of incorrectly identifying packet data as a start code. Nonetheless the 
probability is not zero because start code emulation may occur in audio or private streams. 

If additional protection is desired, for example where a large number of audio streams are multiplexed or 
there is a large amount of private data, a parser could detect a 32-bit pack_start_code followed 8 bytes later 
by a 24-bit packers tart_codej>refix. Once a probable system start code is found by a parser the 
packetjength field may be used to predict the position of the next start code. In this way the probability of 
incorrectJy identifying coded data as a start code decreases geometrically as each successive start code is 
found. A decoder performing this function has the options of either discarding or saving data until parsing 
J? operating with a sufficient level of confidence. Decoding can then begin with the earliest availabledata 
for which system start codes are known. ~ 

If the application includes a means of directly addressing known system start codes, then the probability of 
incorrect parsing of start codes can be made zero. ' 

It is possible for a decoder to "switch channels"; that is, for it to stop decoding one ISO/IEC 1 1 172 
multiplexed stream and to start decoding another. This function generally requires that decoding of the 
sec^strramstart at an unknown byte location. Switching channels is possible, but involves the flushing 

?l^f rbufferS an ?, mtrodu L ces ^ amount of delay depends on the frequency of start codes in the 
second stream, as well as on the exact location where decoding starts. 

A.3.2.2 System layer startup considerations 

Once the decoding system has locked on to the data stream it can begin decoding data. 

Decoding systems determine the correct time to start decoding by comparing the DTS (or PTS) fields 
extracted from the stream or computed as described above, with the current value of the STC The delay 
from the time the decoder begins to process data until it can begin to present decoded data is bounded from 
below by the start-up delay implied by the SCR and PTS fields. A decoder following the STD model may 
produce decoded output as soon as the following conditions are met: 

a) At least one SCR field has been extracted and the STC is synchronized with the DSM via the 
SCRs and mux_rate fields. 

b) At least one PTS has been extracted. 

d) The PTS for an AU which is available is equal to the current STC value. 
DTS fields may be used by a decoder to control input and reorder buffering. 

In addition there may be other constraints imposed by the elementary decoders (for example the need for 
sequence layer mfonnation and an I-picture in video coded according to ISO/IEC 1 1 172-2). 

Sit' ^ Ua T teCS ,bat , tt,e SU ! amS wiU 1)6 ^"""'ked. but it does not necessarily ensure that the 

l^E^Z^S**""- ™? ** nCC f Saiy 10 disCard some decoded audi0 °' video and to wait until 
all elementary decoders are ready. In general there will not be audio and video PUs that start at L^arnT 

It^r 6 "? ValUCS - ™ US eXaC «* Simul,aneous start "P of aul aid Wdetmai requT 
muting some audio samples. J ^ 
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A.3.2.3 Coding layer startup considerations 

decoding at an I-picture at the start of a bwudoHcSL T™E mf ?™auon and must begin 
A.3.2.4 Compensation of actual decoding delays 

values specified in the STO and a buffer la™T,^n ,i° , P™f? m & andprcsentation compared with the 
v-esof^SandD^ 

^S^e^ 

requires addition^ video dSerb?SX ISSZZ^ZSJ by DTS " ™ s in tura 

can be maintained at the ouipTof uS 

PTS and DTS for all the e'erantsuy stre^ penod to the effective values of 

A. 3.2. 5 Channel smoothing 

addition.] soothing tarte nwoSto i£ m^llT' bm "J >SM <»<*"»=l in senenl requires 
<~farea,cin.l^ 

SsKSeTl^^^ 

will be optimized. smoottun 8 15 and the performance of the system 

A. 3. 3 Buffer management in the decoder 

A. 3.4 Time identification 

The absolute time of presentation of the material contained in the coded data stream is indicated in the PTS 
fields. These fields are defined as modulo 2 33 values of ih P on twTcTv- £ If ,n me FrS 

transcoded into other fonnats such^SMPTF SS£ f^rt • ^""^ m f,e,ds ^ * 

initialized to any pardcuKufat^Sfte^*. " * "° reqU,reme,U ^ ^ ^ Values * 

me SSS^SSl S °°? ^ ^i!"**"* with a particular value of presentation time by searching 

time" SSEST 18 ? Wi,hm 3,1 appropriate of "* desi "* ISaZoo 
1 1 172-2 SMPTE -like time-codes are also defined in the video coding layer defined in ISO/EC 

t^esofupto^'h^ 
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intervals specified in 2.4.5.2 is small enough for phase-lock loop stability, but large enough to permit one 
PTS value per I-picture even when I-pjctures occur slightly less often than twice per second 

A. 4 Parameters for CD-ROM multiplexing 

In this clause an example of a multiplex method for CD-ROM is presented. The example is develoDed for 
one video and one audio elementary stream. The ISO/IEC 1 1 172 multiplexed stream is stored on a CD- 
ROM without additional error correction - a mode with 2 324 bytes in each sector. Packs are constructed to 
be this length so that they may be stored one in each sector. The duration of one sector equals 1/75 s. 
resulting in a total bitrate of 8*2324* 75 =1 394 400 bits /s. 

The audio stream is coded in stereo with the ISO/IEC 1 1 172-3 audio layer n coding method at a bitrate of 

192 000 bits / s = 24 000 bytes / s. The sample rate used is 44 100 samples / s. Audio presentation units 
are 1 15Z samples each, and so the size of an audio access unit equals: 

1 152 * 24 000 t 
44 100 hytCS 

As this result is not an integer, most audio access units are 627 bytes but some are only 626 bytes. 

The video stream is coded with a bitrate of 1 158 000 bits / s = 144 750 bytes / s. The value of B vbv used 
is 36 kBytes, leaving sufficient headroom in the 46 kbyte STD buffer of the Constrained System 
Parameters for the multiplexing. 

The packs are to coincide with the sectors. Each pack contains a pack header, one packet of coded audio or 

, one of a ^'"8 stream - Each of coded audio or coded video data contains 

exactly 2 250 data bytes. The padding stream ensures that each pack, including the pack header, consists of 
the number of bytes available in the data field of the sector in which the pack is stored. In sectors where all 
2 324 bytes are available for the ISO/IEC 1 1 172 multiplexed stream, packs are 2 324 byL Ion? In 
sectors whore less than 2 324 bytes are available, the size of the pack is reduced accordingly by decreasing 
toe size of the padding packet. 6 

^o^ 6 ^* 0 ° f 24 000 bytes/ s> witb *** sector containing 2 250 bytes, requires an average 24 000 
/ 2 250 = 10-73 audio sectors/s. Similarly, the coded video bitrate of 144 750 bytes / s, with 2 250 bytes 
per sector, requires an average of 144 750 / 2 250 = 64V 3 video sectors per second. In total, therefore 
exactly 75 sectors of audio and video data are required each second for the combined bitstream, exactly filling 
the total bandwidth of the CD-ROM. . 7 

Interleaving the audio and video sectors must not cause the STD buffers to overflow or underflow Many 
interleaving schemes are possible that will lead to a multiplexed stream following the Constrained System 
Parameters. In this example a simple interleaving scheme is used that repeats every 3 s (225 sectors) The 
scheme starts with 6 video sectors followed by one audio sector. This pattern is repeated 31 times, 
resulting in an interleave of 217 sectors. The last pattern in the interleave scheme consists of 7 video 
sectors ; followed by 1 audio sector. The three second period of 225 sectors contains 32 audio sectors and 

193 video sectors. On average there are 193/3 = 64 1/3 video sectors/second and 32/3 = 10 2/3 audio 
sectors/second, as required. 
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A.5 Example of an ISO/IEC 11172 multiplexed stream 

A sample ISO/EC 11 172 multiplexed stream is presented here to illustrate the syntax and semantic rules 
governing generauon of such streams. This example does not use the same parameters defmedm the 
prevrous clause THe sample stream is a constrained system parameter stream combining two elementary 

sjss^'Sr aud, °- %e eiementa,y ^ - — «■ 10 *»* *- wToir 



following specifications 

A. 5.1 Audio 

Layer II encoding 
48 kHz sample rate 

24 000 bytes/s rate for a pair of stereo channels 
1 152 samples per presentation unit 
576 bytes per access unit 



TJe^im so generated, with place holders for coded audio and video data, is listed in A.5.9 "Sample data 
A. 5. 2 Video 

Constrained parameter video encoding at 150 000 bytes/s. 

25 Hz picture rate source. 

40 * 1 024 Byte video buffer verifier 

The order of pictures at the decoder input is 
II 4P 2B 3B 7P 5B 6B 10P 8B 9B 131 1 IB 12B 16P 14B 15B 19P 17B 18B 22P 20B 21B 251 

I pictures coded at 19 000 bytes each 
P pictures coded at 10 000 bytes each 
B pictures coded at 2 800 or 2 900 (2 875 byte average) each 

A. 5. 3 Multiplexing strategy 

The example employs packets of length 2 048 bytes for both audio and video. The multiplex starts with 
thirteen video packets to limit audio buffering requirements. Thereafter, one audio packet is interleaved with 
every 6 to 7 video packets to match the 6,25 ratio of video nitrate to audio bitrate. 

For simplicity, packets are constructed with a common number of packet_data byte entries. Stuffing bytes 
are used to ensure that all packets have 20 header bytes and 2 028 (fata bytes. 

A pack is generated every third packet This structure is somewhat arbitrary, but leads to a pack rate of 
roughly 29 Hz, comfortably over the 1 to 2 Hz requirement of 2.4.5.2 (Coding of the 
system.clock.reference). Hie cost of such frequent pack formation is not great: all pack headers except the 
first are 12 bytes long, so pack headers account for some 0,2% of the total bitrate. 

Tne sample bitstream is long word aligned. That is, all packets and all packet data (except the initial 
padding stream packet) start at 32-bit boundaries. Because the first pack header is 30 bytes long (it contains 
18 bytes of system header uiformation), a special padding stream packet appears in the first pack This 10- 
byte packet guarantees long word alignment for subsequent packets. 

To summarize, the stream is composed of packs and packets as follows: 
Packl 

header (includes system Jieader) 30 bytes 

Padding stream packet 10 bytes 

Video packets 2 048 bytes 

Video packet #2 2 048 bytes 

Video packet #3 2 048 bytes 
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Pack 2 



header 12 bytes 

Video packet #4 2 048 bytes 

Video packet #5 2 048 bytes 

Video packet #6 2 048 bytes 



Pack 3 



Pack 4 



header 12 bytes 

Video packet #7 2 048 bytes 

Video packet #8 2 048 bytes 

Video packet #9 2 048 bytes 



header 12 bytes 

Video packet #10 2 048 bytes 

Video packet #11 2 048 bytes 

Video packet #12 2 048 bytes 



Pack 5 



header 12bytes 

Video packet #13 2 048 bytes 

Audio packet #1 2 048 bytes 

Video packet #14 2 048 bytes 



A. 5. 4 System clock reference (SCR) 

Bytes 5 to 9 of every pack header contain encoded system_clock_reference fields. The multiplexed stream's 
data rate is computed from the data in A.5.1 and A.5.3, and the following formula: 



R mux = (video data rate + audio data rate) * 



^ (packet header_size + pack header_size/packs/packet). 
packet._data.size 



Rmux - O50 000 + 24 000)(1 V 2 ^*** 73 ) 



= 176 059,1717 bytes/s 

This value can be rounded to 176 059 bytes/s without affecting the values in the data stream in this 
particular example. 

Rmux and the 90 kHz clock frequency are used by the encoder to convert SCR field byte indices to 
system_clock_jeference values. The first SCR field, equal to 3 904, simply reflects a non-zero starting 
value for the encoder's clock. Subsequent SCR fields evaluate to 
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Pack 


system clock reference 


1 


3 904 


2 


7 065 


3 


10212 


4 


13 359 


5 


16 506 


6 


19 653 


7 


22 800 


8 


25 947 



To understand the source of these numbers, consider the second pack's SCR value, SCR2 . The SCR2 field 
occurs 6 180 bytes after the first pack's. SCR2 is related to SCR1 in terms of the elapsed time. For this 
example* s constant rate byte delivery, SCR2 is 

SCR2 = SCR1 + 6 180 * 90 000/176 059 
= 7 065 

A. 5. 5 Presentation time-stamps (PTS) 

the video coding model used for this example leads to coded pictures of the type: 

II 4P 2B 3B 7P 5B 6B 10P 8B 9B 131 11B 12B 16P 14B 15B 19P 17B 18B 22P 20B 21B 251 

Recalling that coded I, P, and B pictures are assumed to be 19 000, 10 000 and 2 800 to 2 900 bytes, 
respectively, and that packets contain 2 028 bytes of data each, it follows that picture start codes occur in 
video packet#l (I picture), video packed 10 (P picture), video packet#15 (B-picture), etc. This is reflected 
by the presence of PTS fields in video packets 1, 10, 15, etc., in the sample stream listing. 

In this example, N, the number of coded pictures between I pictures equals 12. The number of consecutive 
B pictures (M-l) between I or P pictures equals two, and thus M=3. 

The audio coding model used for this example employs 576 byte access units, hence every 2 048-byte audio 
packet contains an access unit start code. Ail audio packets contain PTS fields. 

The value of an elementary stream's first Decoding Time-stamp (DTS) field (or PTS if the two are equal) 
when compared with the initial SCR field, determines the decoder start-up delay for that stream. In the 
example, the first video DTS field has the value 22 804. The difference between the first pack's SCR value 
and the first video packet's DTS value is: 

start-up delay = (22 804 - 3 904 cycles)*(l 000 ms/s)/(90 000 cycles/s) 
= 210 ms 

This delay is required to prevent overflow or underflow in the system target decoder. It tells the decoder that 
the first I picture should be decoded 210 ms and presented 250 ms after reading the last byte of the first SCR 
field in the multiplexed stream. 

Note that the first PTS field in the audio stream equals 26 395, a number slightly lower than the video's. 
This inequality arises if the video and audio encoders are not turned on at exactly the same instant, and does 
not imply synchronization error. 

The system_audio Jock Jlag is set in the system header packet of the the sample bitstream, but the 
system_videoJocK_flag is reset Therefore, decoders may assume a rational relationship between the audio 
clock and the system time clock, but may not assume such a relationship between the video clock and the 
system time clock. PTS and DTS value present in the stream are consistent with exact clocks for both 
video and audio; in practice, however, because the video clock is not locked some drift would appear in 
video time-stamps. Over one second, or 90 000 clock cycles, errors of 50 parts per million would lead to 
PTS values differing from the nominal values by 4 or 5. The discrepancy accumulates over time. 

A. 5. 6 Decoding time-stamp (DTS) 

For I and P pictures, it is generally true that system target decoder operations for decoding and presentation 
occur at different times. Steady state operation with this example's GOP structure (M=3, N=12) leads to I- 
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and P-pictures being decoded three picture periods before their presentation. Thus, video packet #10 has DTS 
equal to 26 404 but PTS is equal to 37 204. This 10 800 clock cycle, 120 ms difference requires the P- 
picture to be stored in the system target decoder's reorder buffer for 3 picture periods. 

Analysis of DTS and PTS values for the first I-picture (video packet #1) reveals a relationship needed to 
initialize the reorder buffer. The I-picture is decoded when the decoder's clock reaches 22 804, but nothing is 
displayed. The initialization is complete 40 ms later when the P-picture discussed in the previous paragraph 
is decoded, and the I-picture is displayed. 

The second audio PTS field (value of 35 035) lags the first by 8 640 clock cycles, or 96 ms. Audio 
presentation units are 1 152 samples long, which at a 48 kHz sampling rate, corresponds to 24 ms. The 
second audio PTS field, therefore, appears in the stream after the start code for the fifth audio access unit 

A. 5. 7 Buffer sizes 

Hie example documents a constrained system parameter stream with pictures conforming to the video 
constrained parameters defined in Part 2 of this International Standard. The maximum allowable buffer sizes 
in the STD for such streams are used. These are: 

Video streams: 46 * 1 024 bytes 
Audio streams: 4 *1 024 bytes 



A. 5. 8 Adherence to System Target Decoder (STD) 

For a stream to be a valid ISOAEC 1 1 172 multiplexed stream, it must play on the system target decoder 
without overflow or underflow of any STD buffer. Tables A.i and A.2 track buffer occupancy for the 
STD video and audio buffers, respectively. The tables demonstrate that the one-second long sample 
bitstream complies with the STD buffering requirements. 



Table A.I System target decoder video buffer occupancy 



Input Picture Index 
and Type (in coded 
order) 


End-of-picture 
delivery time (ms) 


Decoding / 
Presentation time 
(ms) 


Buffer Occupancy 
(bytes) 




0 






11 


109 


210/250 


34 568 


4P 


178 


250/370 


21560 


2B 


194 


290 


17 468 


3B 


211 


330 


20 928 


7P 


280 


370/490 


23 584 


5B 


297 


410 


20196 


6B 


313 


450 


22 676 


10P 


382 


490/610 


26 664 


8B 


399 


530 


21 692 


9B 


427 


570 


25 756 


131 


548 


610/730 


27 884 


11B 


564 


650 


15 848 


12B 


580 


690 


17 900 


16P 


650 


730/850 


21964 


14B 


678 


770 


16 816 


15B 


694 


810 


20 880 


19P 


763 


850/970 


23 016 


17B 


780 


890 


20 072 


18B 


796 


930 


22 460 


22P 


866 


970/1 090 


26 088 


20B 


882 


1010 


23 052 


21B 


898 


1 050 


27 308 


251 


1019 


1 090/1 210 


31 372 
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The end-of-picture delivery time is the time of arrival of the final byte of the picture at the input of the 
video buffer in the STD. 

In preparing these tables, coded B -pictures were assumed to alternate between 2 800 and 2 900 bytes in a 
manner leading to an overall video rate of 150 000 bytes/s. 

Table A.2 - System target decoder audio buffer occupancy 



AAU# 


oid-of- AA1 1 
utuvcry 




Buffer 
Occupancy 
(bytes) 


mm 


152 





— 


1 


155 


250 


3000 


2 


158 


274 


3 480 


3 


161 


298 


2904 


4 


246 


322 


2 328 


5 


249 


346 


3 780 


6 


253 


370 


3204 


7 


256 


394 


2 628 


8 


329 


418 


3 904 


9 


333 


442 


3 504 


10 


336 


466 


2928 


11 


409 


490 


2444 


12 


412 


514 


3 804 


13 


416 


538 


3 228 


14 


419 


562 


2 652 


15 


492 


586 


2 696 


16 


496 


610 


3 528 


17 


499 


.634 


2 952 


18 


584 


658 


2 376 


19 


587 


682 


3 828 


20 


590 


706 


3 252 


21 


594 


730 


2 676 



Each row in tables A.l and AJ2 indicates timing and buffer occupancy for a single video or audio access 
unit The columns in table A2 are, from left to right: 

1) Identification of the access unit 

2) The time of arrival of the final byte of me access unit 

3) The access unit's decoding and presentation time-stamp. 

4) Hie number of bytes in the STD buffer immediately before extraction of the access unit 

Consider, for example, the row in table A.l for picture 4P. This picture's final byte occurs at byte number 
31 444 in the multiplex stream. The stream is delivered at a constant rale of 176 059 bytes/s. Therefore, 
the delivery of picture 4P is complete 1 000 * 31 444 / 176 059 = 178 ms into the stream. The picture's 
DTS and PTS values are encoded in the stream. They are 250 ms and 370 ms greater than the SCR of the 
first pack. At time 250 ms, when the picture is decoded, the 22nd packet - an audio packet - is being 
delivered. At that time the video buffer is not being filled. The buffer contains the contents of exactly 20 
video packets, less one I-picture that was extracted 40 ms earlier. The buffer fullness is therefore 20*2 028 
- 19 000 = 21 560 bytes. 

By comparing decoding times with delivery times it is possible to see that underflow is avoided. So long 
as an access unit has been completely delivered before it is required for decoding, underflow does not occur. 

If the maximum buffer fullness immediately before decoding each access unit is compared with the STD 
buffer size for the stream, it is possible to determine that buffer overflow is avoided. In this example the 
video stream buffer never exceeds 46 kbytes and the audio buffer never exceeds 4 kbytes. Note mat the late 
placement of the first audio packet is necessary to avoid audio buffer overflow. 
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A. 5. 9 Sample data stream 



No, of Field Description - Coded 

Bytes Values 

~ pack_start_code(#l) ' 000001BA 

1 '0010\ SCR-32 thru 30 , markerjrit 21 

2 SCR-29 thru 15 , markerjrit 0001 

2 SCR-14thru 0, markerjrit 1E81 

3 markerjrit, mux_rate, markerjrit 801B83 

4 systemJieader_starLcode 000001BB 

2 headerjengtb 000C 

3 markerjrit, ratejxwnd , markerjrit 801B83 
1 audio Jwund, fixed Jlag , CSPS Jlag 07 
1 system_audioJockJlag, system_videoJockJlag, Al 

markerjbit, video J>oiind 

1 reserved_byte FF 

1 stream_id (audio) CO 

2 '1 r, STOJ)uffer_bound_scale , STD_buffer_size_bound C020 

1 stream Jd (video) E3 

2 '1 1\ STD_bufTer_bound_scale , STDJ)uffer_sizeJx)und E02E 

3 packet_start_code_prefix 000001 

1 streamjd (padding) BE 

2 packeUength 0003 

1 , ooooinr OF 

1 '1111 1111* FF 

1 '11111111' FF 

1 '11111111' FF 

3 packet_start__code_prefix (#1 V) 000001 

1 stream Jd (video) E3 

2 packeUength 07FA 

4 stuffing_bytes FFFFFFFF 

1 '0011', PTS-32 thru 30 , markerjrit 31 

2 PTS-29 thru 15 , markerjnt 0001 
2 PTS-14 thru 0 , markerjrit CE49 

1 '0001', DTS-32 thru 30 , markerjnt 11 

2 DTS-29 thru 15 , markerjnt 0001 

2 DTS-14 thru 0, markerjbit B229 
2 028 packet_data„byte XXX...X 

3 packeLstarUcode_prefix (#2V) 000001 

1 stream Jd E3 

2 packeUength 07FA 
14 stuffing_byte FF....FF 

2 028 packet_dataj>yte XXX...X 

3 packeLstart_code_prefix (#3V) 000001 

1 stream Jd E3 

2 packeUength 07FA 
14 stuffingj>yte FF....FF 

2 028 packeLdataJ)yte XXX...X 



4 pacK.start.code (#2) 000001BA 

1 '0010', SCR-32 thru 30 , markerjrit 21 

2 SCR-29 thru 15 , markerjrit 0CKH 

2 SCR- 14 thru 0, markerjbit 3733 

3 markerjrit, mux_rate , markerjnt 801B83 

3 packet_start_code_prefix (#4V) 000001 
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1 stream J d 

2 packetjength 
14 stuffingj>ytes 

2 028 packeLdatOyte 

3 packeLstarUcode_prefix (#5V) 

1 streamjd 

2 packeUength 
14 stuffingj>yte 

2 028 packeLdataJ)yte 

3 packet_start_code_^fix (#6V) 

1 streamjd 

2 packeUength 
14 stuffingj>yte 

2 028 packet_dataj)yte 



E3 
07FA 
FF....FF 
XXX...X 

000001 
E3 
07FA 
FF....FF 
XXX...X 

000001 
E3 
07FA 
FF....FF 
XXX...X 



4 
1 

2 
2 
3 

3 
1 
2 
14 
2 028 

3 
1 

2 
14 
2 028 

3 
1 
2 
14 

2 028 



pack_starLcode(#3) 

'0010\ SCR-32thni30,markerJ)it 
SCR-29 thru 15 , marker J)it 
SCR- 14 thru 0 , marker_bit 

markerj>it, mux_rate , marker J)it 

packeLstart_code_prefix (#7V) 

streamjd 

packetjength 

stuffing_byte 

packeLdataJ>yte 

packeLstart_code_prefix (#8V) 

streamjd 

packetjength 

stuffing_byte 

packet_dataj>yte 

packeLstarucodejjrefix (#9V) 

streamjd 

packeUerigth 

stuffingj)yte 

packeLdata.byte 



000001BA 
21 
0001 
47C9 
801B83 

000001 
E3 
07FA 
. FF....FF 
XXX...X 

000001 
E3 
07FA 
FF....FF 
XXX...X 

000001 
E3 
07FA 
FF....FF 
XXX...X 



4 
1 

2. 
2 
3 

3 
1 
2 
2 
2 
1 
2 
2 
1 
2 
2 

2 028 



paclestarLcode(#4) 

'0010', SCR-32 thm 30 , marker J>it 
SCR-29 thru 15 , marker J)it 
SCR-14 thru 0 , marker_bit 

marker J)it, mux_rate , marker J)it 

packeLstart_code,prefix (#10V) 
streamjd 
packeUength . 
stuffin^byte 



•or, 
'001 r, 



STDJ)uffer_scale , STDJ>uffer_size 
PTS-32 thru 30 , markerj>it 
PTS-29 thru 15 , marker_bit 
PTS-14 thru 0 , marker J)it 
DTS-32thni 30 , marker_bit 
DTS-29 thru 15 , marker_bit 
DTS-14 thru 0 , marker bit 
packet_data_byte 



'ooor, 



000001BA 
21 
0001 
685F 
801B83 

000001 
E3 
07FA 
FFFF 
602E 
31 
0003 
22A9 
11 
0001 
CE49 
XXX...X 
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3 packet_start_codej?refix (#1 IV) 

1 stream Jd 

2 packeUength 
14 stuffingj>yte 

2 028 packeulalOyte 

3 packet_stait_code_prefix (#12V) 

1 stream Jd 

2 packeUength 
14 stuffing_byte 

2 028 packet.daia.byte 



000001 
E3 
07FA 
FF....FF 
XXX...X 

000001 
E3 
07FA 
FF....FF 
XXX...X 



4 pack_starUcode (#5) 

1 '0010', SCR-32thnj30,marker_bit 

2 SCR-29 thru 15 , marker J>it 

2 SCR- 14 thru 0 , marker_bit 

3 marker J)it, mux_rate , marker_bit 

3 packeLstarLcodej>refix (#13V) 

1 stream_id 

2 packeUength 
14 stitffingj>yte 

2 028 packet_data_byte 

3 packet_start_code_prefix (#1A) 

1 stream Jd (audio) 

2 packeUength 
7 stuffing_bytes 

4 *01\ STD_buffer_scaIe , STDJ>uffer_size 

1 WW, PTS-32 thru 30 , marker J»t 

2 PTS-29 thru 15 , marker bit 

2 PTS-14thru 0, marker J>it 
2 028 packet_data_byte 

3 packet_start^code_prerix (# 14V) 

1 stream Jd 

2 packeUength 
14 stuffing_byte 

2 028 packeLdataJ)yte 



000001BA 
21 
0001 
80F5 
801B83 

000001 
E3 
07FA 
FF....FF 
XXX...X 

000001 
CO 
07FA 
FF....FF 
4020 
21 
0001 
CE37 
XXX...X 

000001 
E3 
07FA 
FF....FF 
XXX.. .X 



4 pade.start.code (#6) 

1 '0010', SCR-32 thru 30 , marker_bit 

2 SCR-29 thru 15 . marker bit 

2 SCR-14 thru 0 , marker J)it 

3 marker_bit, mux_rate , marker J>it 

3 packet_staii_codej)rerix (#15V) 

1 streamjd 

2 packeUength 
9 stuffingj>yte 

1 '0010, FTS-32 thru 30 , marker bit 

2 PTS-29 thru 15 , marker_bit 

2 PTS-14 thru 0 , marker bit 
2 028 packeUlataL_byte 

3 packeL_starLcodej)refix (#16V) 

1 streamjd 

2 packetjengt 



000001BA 
21 
0001 
998B 
801B83 

000001 
E3 
07FA 
FF....FF 
21 
0001 
EA69 
XXX...X 

000001 
E3 
07FA 
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9 stuffingj>yte 

1 *0010\ PTS-32ihru30 f markecbit 

2 PTS-29 thru 15 , markerj>k 
oml PTS-14 thru 0,marker_bit 
2 028 packeulataj>yte 

3 packeutart^codejweflx (#17V) 

1 streamjd 

2 packetjength 
14 stuffing^byte 

2 028 packet_dataj)yte 



FF....FF 
21 
0003 
0689 

XXX...X 

000001 
E3 
07FA 
FF....FF 
XXX...X 



4 
1 

2 
2 
3 

3 
1 
2 
2 
2 
1 
2 
2 
1 
2 
2 

2 028 

3 
1 
2 
14 
2 028 

3 
1 
2 
14 

2 028 



pack_starLcode (#7) 

'0010\ SCR-32 thru 30 , marker bit 
SCR-29 thru 15 , marker~bit 
SCR-14thru 0, marker bit 

marker J)it, mux_rate , marker_bit 

packeLstarLcode_j)refix (#18V) 
streamjd 
packetjength 
stiiffingj>yte 



•or, 

•001 1\ 



STOJ)uffer_scaIe , STD_buffer_size 
PTS-32 thru 3Q , markerjrit 
PTS-29 thru 15 , marker_bit 
PTS-14 thru O.markerjrit 
000r, DTS-32 thru 30 . markerjut 
DTS-29 thru 15 , markerjjit 
DTS-14 thru 0 , marker J)it 
packet_data_byte 

packeLstort_code_prefix (#19V) 

streamjd 

packetjength 

stuffing_byte 

packet_dataj>yte 

packeLstarLcodejwefix (#20V) 

streamjd 

packetjength 

stuffing_byte 

padceLdataJ>yte 



000001BA 
21 
0001 
B221 
801B83 

000001 
E3 
07FA 
FFFF 
602E 
31 
0003 
7709 
11 
0003 
22 A9 
XXX...X 

000001 
E3 
07FA 
FF....FF 
XXX...X 

000001 
E3 
07FA 
FF....FF 
XXX...X 



4 
1 

2 
2 
3 

3 
1 
2 
7 
2 
1 
2 
2 

2 028 



pack^starLcode(#8) 

'0010\ SCR-32 thru 30 , markerjrit 
SCR-29 thru 15 , marker J>it 
SCR-14 thru 0 , markerjrit 

marker J>it, mux_rate , marker_bit 

packeLstarLcodejjrefix (#2A) 
streamjd (audio) 
packetjength 
stuffing_bytes 

'Or, STTLbuffer.scaJe , STD.buffer_size 
'0010', PTS-32 thru 30 , marker_bit 
PTS-29 thru 15 , markerjrit 
PTS-14 thru 0, markerjrit 
packeuiataj>yte 



000001BA 
21 
0001 
CAB7 
801B83 

000001 
CO 
07FA 
FF....FF 
4020 
21 
0003 
11B7 
XXX...X 
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3 packeLstart_<xxte_jweFix (#21 V) 

1 streamjd 

2 packeUength 
14 stuffingj>yte 

2 028 packet_datajbyte 

3 packecstarLcodejwefix (#22 V) 

1 streamjd 

2 packeUength 
14 stuffingjyte 

2 028 packeL.data.byte 



000001 
E3 
07FA 
FF....FF 
XXX...X 

000001 
E3 
07FA 
FF....FF 
XXX...X 



iso_11172 end code 



000001B9 
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A.6 H.us.ra.ion o, ,„e structure o, ,„e ,so/ IEC ,1 172 multip)ex 




CO 
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<D 
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Annex B 

(informative) 

List of patent holders 



The user's attention is called to the possibility that - for some of the processes specified in this part of 
ISO/IEC 1 1 172 - compliance with this International Standard may require use of an invention covered bv 
patent rights. J 

By publication of this part of ISO/IEC 1 1 1 72, no position is taken with respect to the validity of this 
claim or of any patent rights in connection therewith. However, each company listed in this annex has filed 
with the Information Technology Task Force (ITTF) a statement of willingness to grant a license under 
such rights that they hold on reasonable and nondiscriminatory terms and conditions to applicants desiring 
to obtain such a license. . 

Information regarding such patents can be obtained from : 



AT&T 

32 Avenue of the Americas 
New Yoik 
NY 10013-2412 
USA 



Aware 

1 Memorial Drive. 



02142 Massachusetts 
USA 



Bellcore 

290 W Mount Pleasant Avenue 

Livingston 

NJ 07039 

USA 

The British Broadcasting Corporation 

Broadcasting House 

London 

W1A1AA 

United Kingdom 

British Telecommunications pic 

Intellectual Property Unit 

13th Floor 

151 Gower Street 

London 

WC1E6BA 

United Kingdom 

CCETT 

4 Rue du Clos-Courtel 

BP 59 

F-35512 

Cesson-Sevigne Cedex 
France 
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CNET 

38-40 Rue du General Leclerc 
F-92131 Issy-les-Moulineaux 
France 

Compression Labs, Incorporated 

2860 Junction Avenue 

San Jose 

CA 95134 

USA 

CSELT 

Via G Reiss Romoli 274 

1- 10148 Torino 
Italy 

CompuSonics Corporation 

PO Box 61017 

Palo Alto 

CA 94306 

USA 

Daimler Benz AG 
PO Box 800 230 
Epplestrasse 225 

D-7000 Stuttgart 80 , 
Germany 

DornierGmbn 

An der Bundesstrasse 31 

D-7990Friedrichshafenl 

Germany 

Fraunhofer Gesselschaf t zur Fberderung der Angerwandten Forschune e V 
Leonrodstrasse 54 
8000Muenchen 19 
Germany 

. Hitachi Ltd 

6 Kanda-Surugadai 4 chome 

Chiyoda-ku 

Tokyo 101 

Japan 

Institut fur Rundfunktechnik Gmbh 
FlorianmuhlstraBe 60 
8000Miinchen45 
Germany 

International Business Machines Corporation 
Armonk 

New York 10504 
USA 

KDD Corporation 

2- 3-2 Nishishinjuku 
Shinjuku-ku 
Tokyo 

Japan 
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Licentia Patent- Verwaltungs-Gmbh 
Theodor-Stem-Kai & 
D-6000 Frankfurt 70 
Gennany 

Massachusetts Institute of Technology 

20 Ames Street 

Cambridge 

Massachusetts 02139 

USA 

Matsushita Electric Industrial Co. Ltd 

1006Gaza-Kadoma 

Kadoma 

Osaka 571 

Japan 

Mitsubishi Electric Corporation 

2-3 Marunouchi 

2-Chome 

Chiyoda-Ku 

Tokyo 

100 Japan 

NEC Corporation 

7-1 Shiba5-Chome 

Minato-ku 

Tokyo 

Japan 

Nippon Hoso Kyokai 
2-2-1 Jin-nan 
Shibuya-ku 
Tokyo 150-01 
Japan 

Philips Electronics NV 
Groenewoudseweg 1 
5621 B A Eindhoven 
The Netherlands 

Pioneer Electronic Corporation 

4-1 Meguro 1-Chome 

Meguro-ku 

Tokyo 153 

Japan 

Ricoh Co, Ltd 
1-3-6 Nakamagome 
Ohta-ku 
Tokyo 143 
Japan 

Schawartz Engineering & Design 
15 Buckland Court 
San Carlos, CA 94070 
USA 



52 



Exhibit 18, page 58 



© ISO/IEC 



ISO/IEC 11172-1: 1993 (E) 



Sony Corporation 
6-7-35 Kitashinagawa 
Shinagawa-ku 
Tokyo 141 
Japan 

Symbionics 

St John's Innovation Centre 

Cowley Road 

Cambridge 

CB44WS 

United Kingdom 

Telefunken Fernseh und Rundfunk GmbH 
Gottinger Chaussee 
D-3000 Hannover 91 
Germany 

Thomson Consumer Electronics 

9, Place des Vosges 

La DeTense 5 

92400 Courbevoie 

France 

Toppan Printing Co, Ltd 

1-5-1 Taito 

Taito-ku 

Tokyo 110 

Japan 

Toshiba Corporation 
1-1 Sbibaru 1-Chome 
Minato-ku 
Tokyo 105 
Japan 

Victor Company of Japan Ltd 

12 Moriya-cho 3 chome 

Kanagawa-ku 

Yokohama 

Kanagawa221 

Japan 
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ISSK^*^^^ video recording, data storage device, digrtai storage, coded 

Price based on 53 pages 



Exhibit 18, page 60 



INTERNATIONAL ISO/IEC 



Information technology — Coding of 
moving pictures and associated audio for 
digital storage media at up to about 
1,5Mbit/s — 

Part 2: 



Technologies de I'information — Codage de I'image anim6e et du son 
assocte pourles supports de stockage numGrique jusqu'b environ 
1,5 Mbh/s — 

Partie 2: Vid6o 



STANDARD 



11172-2 



First edition 
1993-08-01 



Video 




Reference number 
ISO/IEC 11 172-2:1 993(E) 



Exhibit 18, page 61 



ISO/IEC 11172-2: 1993 (E) 



Contents Page 

Foreword ; ... 

Introduction j Y 

Section t: General..: j 

1.1 Scope ; | 

1.2 Normative references I 

Section 2: Technical elements ; 3 

2.1 Definitions 3 

2.2 Symbols and abbreviations ; n 

2.3 Method of describing bitstream syntax 13 

2.4 Requirements .....15 

Annexes 

A 8 by 8 Inverse discrete cosine transform 39 

B Variable length code tables 40 

C Video buffering verifier ......... 49 

D G uide to encoding video 51 

E Bibliography 108 

F List of patent holders 109 

<D ISO/IEC 1993 

All rights reserved. No part of this publication may be reproduced or utilized in any form or 
any means, electronic or mechanical, including photocopying and microfilm, without 
permission in writing from the publisher. 

ISO/IEC Copyright Office • Case Postale 56 • CH 121 1 Geneve 20 • Switzerland 
Printed in Switzerland. 



ii 



Exhibit 18, page 62 



ISO/IEC 11172-2: 1993 (E) 



Foreword 



ISO (the International Organization for Standardization) and IEC* (the Inter- 
national Electrotechnical Commission) form the specialized system for 
worldwide standardization. National bodies that are members of ISO or 
IEC participate in the development of International Standards through 
technical committees established by the respective organization to deal 
with particular fields of technical activity. ISO and IEC technical com- 
mittees collaborate in fields of mutual interest. Other international organ- 
izations, governmental and non-governmental, in liaison with ISO and IEC 
also take part in the work. 

In the field of information technology, ISO and IEC have established a joint 
technical committee, ISO/IEC JTC 1. Draft International Standards adopted 
by the joint technical committee are circulated to national bodies for vot- 
ing. Publication as an International Standard requires approval by at least 
75 % of the national bodies casting a vote. 

International Standard ISO/IEC 11172-2 was prepared by Joint Technical 
Committee ISO/IEC JTC 1, Information technology, Sub-Committee SC 29, 
Coded representation of audio, picture, multimedia and hypermedia infor- 
mation. 

ISO/IEC 11172 cohsists of the following parts, under the general title l& 
formation technology — Coding of moving pictures and associated audio 
for digital storage media at up to about 1,5 Mbit/s: 

— Part }: Systems 

— Part 2: Video 

— Part 3: Audio 

— Part 4: Compliance testing 

Annexes A, B and C form an integral part of this part of ISO/IEC 11172 
Annexes D, E and F are for information only. 
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Introduction 



Note - Readers interested in an overview of the MPEG Video layer should read this Introduction and then 
proceed to annex D, before returning to clauses 1 and 2. 



0.1 Purpose 

Tlus jHrt of ISO/IEC 1 1 172 was developed in response to the growing need for a anninon forn^fdr 
representing compressed video on various digital storage media such as CDs, DATs, Winchester disks and 
optical drives: This part of ISO/IEC 1 1 172 specifies a coded representation that can be used for 
compressing video sequences to bitrates around 1,5 Mbit/s. The use of this part of ISO/IEC 1 1 172 means 
that motion video can be manipulated as a form of computer data and can be transmitted and received over 
existing and future networks. Tne coded representation can be used withboth 625-line and 525-lme 
television and provides flexibility for use with workstation and personal computer displays. 

This part of ISO/IEC 1 1 172 was developed to operate principally from storage media offering a continuous 
transfer rate of about 1,5 Mbit/s. Nevertheless it can be used more widely than this because the approach 
taken is generic. 

0.1,1 Coding parameters 

The intention in developing this part of ISO/IEC 11 172 has been to define a source coding algorithm with a 
large degree of flexibility that can be used in many different applications. To achieve this goal, a number of 
the parameters defining the characteristics of coded bitstreams and decoders are contained in the bitstream 
itself, This allows for example, the algorithm to be used for pictures with a variety of sizes and aspect 
ratios and on channels or devices operating at a wide range of bitrates. 

Because of the large range of the characteristics of bitstreams that can be represented by this part of ISO/IEC 
1 1 172, a sub-set of these coding parameters known as the "Constrained Parameters" has been defined. The 
aim in defining the constrained parameters is to offer guidance about a widely useful range of parameters. 
Conforming to this set of constraints is not a requirement of this part of ISO/IEC 1 1 172. A flag in the 
bitstream indicates whether or not it is a Constrained Parameters bitstream. 



Summary of the Constrained Parameters: 



Horizontal picture size 


Less than or equal to 768 pels 


Vertical picture size 


Less than or equal to 576 lines 


Picture area 


Less than or equal to 396 macroblocks 


Pel rate 


Less than or equal to 396x25 macroblocks/s 


Picture rate 1 ; 


;Less than or equal to 30 Hz 


Motion vector range 


Less than -64 to +63,5 pels (using half-pel vectors) 
[backward f code and forward f code <= 4 (see table D.7)l 


Input buffer size (in VB V model) 


Less than or equal to 327 680 bits 


Bitrate 


Less than or equal to 1 856 000 bits/s (constant bitrate) 



0.2 Overview of the algorithm 

The coded representation defined in this part of ISO/EC 1 1 172 achieves a high compression ratio while 
preserving good picture quality. The algorithm is not lossless as the exact pel values are not preserved 
during coding. The choice of the techniques is based on the need to balance a high picture quality and 
compression ratio with the requirement to make random access to the coded bitstream. Obtaining good 
picture quality at the bitrates of interest demands a very high compression ratio, which is not achievable 
with intraframe coding alone. The need for random access, however, is best satisfied with pure intraframe 
coding. This requires a careful balance between intra- and interframe coding and between recursive and non- 
recursive temporal redundancy reduction. 
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The Sgorithm then use^^-bSmodon «£~S.f? ^"f* ^ resolution for toe signal, 
compensation is used for SSSSkSSl ^^^Z^ * I*""*"*. MottotT 

prediction of the current picture ^l^^^f^^^TP^' forn °n-causal 
pictures. Motion vectors are clef ned fw eS lSkv ,S n^T 1 , f dlCti0n £rom P 331 31,(1 fu ^ 
the prediction error, is further 0f r 11,6 fe- ™ e difference signal, 

correlation before i is quantized to SJESf dlscret t cos J me <ransfonn (DCT) to remove spatial 

Rnaliy, the moL v^a^^S ti^ ZtttSXZS^JZ mfon S 

w,m LHJl 'wormauon, and coded using variable length codes. 

0.2.1 Temporal processing 

l^^^^ g ^^^^^^ efficient compression, three main 

IheypSeac^^tstS^^ 

moderate compression ratio. Predictive c^riteZfm ^* g f™ beg ?' but coded ^ only a 
compensated prediction fiom a^SSri^.^^ more efficiently using motion 
further prediction. BidirectioiX-p^ 

compression but require both and Sm5 refere^^S; r,^ P 1 ™* 6 ^'degree of 



Bi-directional 
Prediction 




Prediction 

Figure 1 ~ Example of temporal picture structure 
0.2.2 Motion representation - macrobtocks 

Each macroblock can be o^aSJESt^' ^ overtlead "^eded to store it 

^^^^^^^^^^^ 

using variable-lenjtth cS»TSI2S2! S12^Sf By W,th respect 10 me last coded motion vector, 
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0.2.3 Spatial redundancy reduction 

Bofo original pictures and prediction error signals have high spatial redundancy. This part of ISO/EEC 
11172 uses a blodc-based DCT method with visually weighted quantization and run-length coding Each 8 
y °^° f . theon 8 ,nal for intra-coded macroblocks or of the prediction error for predicriw-coded 
macroblocks is transformed into the DCT domain where it is scaled before being quantized After 
quannzation many of the coefficients are zero in value and so two-dimensional run-length and variable 
length coding is used to encode the remaining coefficients efficiently. 

0.3 Encoding 

"SlUS^SSS " 1? ? d0CS " 0t Sp ^ y 311 encodin 8 » specifies the syntax and semantics of 

SS^T^ d die sig^pio<^mgm the decoder. As a result, many options are left open to encoders 
to trade-off cost and speed against picture quality and coding efficiency. This clause is a brief description of 
the functions that need to be performed by an encoder. Figure 2 shows the main functional bS! 

Regulator 



Picture 
Re-order 



Source input pictures 




where 

DCT is discrete cosine transform 

DCT 1 is inverse discrete cosine transform 

Q is quantization 

is dequantization 
VLC is variable length coding 

Figure 2 - Simplified video encoder block diagram 

The input video signal must be digitized and represented as a luminance and two colour difference signals 

. V C "* S ^ *? followed by preprocessing and format conversion to select an appropriate 
wmdow^oludon and input fonnat This part of ISO/EEC 11172 requires that the colouTdifferehce 
signak (Cb and C r ) are subsampled with respect to the luminance by 2:1 in both vertical and horizontal 
direcuons and are reformatted, if necessary, as a non-interlaced signal. 

S^S? T' Ch0 ? B Which picture ^ to "se for each picture. Having defined the picture types, the 
needed for each non-intra macroblock and in B-Pictures one or two vectors are needed. 

K^^f^fJf^ ^ ^ering of the picture sequence is necessary before encoding. Because B- 
Pictures are coded using bidirectional motion compensated prediction, they can only be decodeSeVthe 
subsequent reference picture (an I or P-Picture) has been decoded. Therefore the pictu^reoS by the 
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encoder so that the pictures arrive at the decoder in the order for decoding. The correct display order is 
recovered by the decoder. 

The basic unit of coding within a picture is the macroblock. Within each picture, macroblocks are encoded 
in sequence, left to right, top to bottom. Each macroblock consists of six 8 by 8 blocks: four blocks of 
luminance, one block of Cb chrominance, and one block of Cr chrominance. See figure 3. Note that the 
picture area covered by the four blocks of luminance is the same as the area covered by each of the 
chrominance blocks. This is due to subsampling of the chrominance information to match the sensitivity of 
the human visual system. J 



0 


1 


2 


3 



Y Cb Cr 

Figure 3 — Macroblock structure 

Firstly, for a given macroblock, the coding mode is chosen. It depends on the picture type, the 
effectiveness of motion compensated prediction in that local region, and the nature of the signal within the 
block. Secondly, depending on the coding mode, a motion compensated prediction of the contents of the 
block based on past and/or future reference pictures is formed. This prediction is subtracted from the actual 
data in the current macroblock to form an error signal. Thirdly, this error signal is separated into 8 by 8 
blocks (4 luminance and 2 chrominance blocks in each macroblock) and a discrete cosine transform is 
performed on each block. Each resulting 8 by 8 block of DCT coefficients is quantized and the two- 
dimensional block is scanned in a zig.zag order to convert it into a one-dimensional string of quantized DCT 
coefficients. Fourthly, the side-information for the macroblock (mode, motion vectors etc) and the 
quantized coefficient data are encoded. For maximum efficiency, a number of variable length code tables are 
defined for the different data elements. Run-length coding is used for the quantized coefficient data 

A consequence of using different picture types and variable length coding is that the overall data rate is 
variable. In applications that involve a fixed-rate channel, a FIFO buffer may be used to match the encoder 
output to the channel. The status of this buffer may be monitored to control the number of bits generated 
by the encoder. Controlling the quantization process is the most direct way of controlling the bitrate. This 
part of ISO/IEC 1 1 172 specifies an abstract model of the buffering system (the Video Buffering Verifier) in 
order to constrain the maximum variability in the number of bits that are used for a gi veil picture. This 
ensures that a bitstream can be decoded with a buffer of known size. 

At this stage, the coded representation of the picture has been generated. The final step in the encoder is to 
regenerate I-Pictures and P-Pictures by decoding the data so that they can be used as reference pictures for 
subsequent encoding. The quantized coefficients are dequantized and an inverse 8 by 8 DCT is performed on 
each block. The prediction error signal produced is then added back to the prediction signal and limited to 
the required range to give a decoded reference picture. 

0.4 Decoding 

Decoding is the inverse of the encoding operation. It is considerably simpler than encoding as there is no 
need to perform motion estimation and there are many fewer options. The decoding process is defined by 
this part of ISO/IEC 1 1 172. The description that follows is a very brief overview of one possible way of 
decoding a bitstream. Other decoders with different architectures are possible. Figure 4 shows the main 
functional blocks. 
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Quantizer stepazc 



1 



Coded video 
bitstrt-am 



Buffer ^MUX' A ^ » f^F^lH Q^ 1 H PCT 1 [ — 



Motion Vectors 




Reconstructed 
output pictures 



Where 

DCT 1 is inverse discrete cosine transform 

Qrl is dequantization 

MUX" 1 is demultiplexing 

VLD is variable length decoding 

Figure 4 — Basic video decoder block diagram 

For fixed-rate applications, the channel fills a FIFO buffer at a constant rate with the coded bitstream. The 
decoder reads this buffer and decodes the data elements in the bitstream according to the defined syntax. 

As the decoder reads the bitstream, it identifies the start of a coded picture and then the type of the picture. 
It decodes each macroblock in the picture in turn. The macroblock type and the motion vectors, if present, 
are used to construct a prediction of the current macroblock based on past and future reference pictures that 
have been stored in the decoder. The coefficient data are decoded and dequantized. Each 8 by 8 block of 
coefficient data is transformed by an inverse DCT (specified in annex A), and the result is added to the 
prediction signal and limited to die defined range. 

After all the macroblocks in the picture have been processed, the picture has been reconstructed. If it is an I- 
picture or a P-picture it is a reference picture for subsequent pictures and is stored, replacing the oldest stored 
reference picture. Before the pictures are displayed they may need to be reordered from the coded order to 
their natural display order. After reordering, the pictures are available, in digital form, for post-processing 
and display in any manner that the application chooses. 

0.5 Structure of the coded video bitstream 



This part of ISO/IEC 1 1 172 specifies a syntax for a coded video bitstream. This syntax contains six layers, 
each of which either supports a signal processing or a system function: 



Layers of the syntax 


Function ! 


Sequence layer 
Group of pictures layer 
Picture layer 
Slice layer 
Macroblock layer 
Block layer 


Random access unit: context 
Random access unit: video 
Primary coding unit 
Resynchronization unit 
Motion compensation unit 
DCT unit 



0.6 Features supported by the algorithm 

Applications using compressed video on digital storage media need to be able to perform a number of 
operations in addition to normal forward playback of the sequence. The coded bitstream has been designed 
to support a number of these operations. 
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0.6.1 Random access 

Random access is an essential feature for video on a storage medium. Random access requires that any 
picture can be decoded in a limited amount of time. It implies the existence of access points in the 
bitstream - that is segments of information that are identifiable and can be decoded without reference to other 
segments of data. A spacing of two random access points Ontra-Pictures) per second can be achieved 
without significant loss of picture quality. 

0.6.2 Fast search 

Depending on the storage medium, it is possible to scan the access points in a coded bitstream (with the 
help of an application-specific directory or other knowledge beyond the scope of this part of ISO/EEC 
1 1 172) to obtain a fast-forward and fast-reverse playback effect 

0.6.3 Reverse playback 

Some applications may require the video signal to be played in reverse order. This can be achieved in a 
decoder by using memory to store entire groups of pictures after they have been decoded before being 
displayed in reverse order. An encoder can make this feature easier by reducing the length of groups of 
pictures. 

0.6.4 Error robustness 

Most digital storage media and communication channels are not error-free. Appropriate channel coding 
schemes should be used and are beyond the scope of this part of ISO/IEC 11172. Nevertheless the 
compression scheme defined in this part of ISO/IEC 1 1 172 is robust to residual errors. The slice structure 
allows a decoder to recover after a data error and to resynchronize its decoding. Therefore, bit errors in the 
compressed data will cause errors in the decoded pictures to be limited in area. Decoders may be able to use 
concealment strategies to disguise these errors. 

0.6.5 Editing 

Tliere is a conflict between the requirement for high coding efficiency and easy editing. The coding structure 
and syntax have not been designed with the primary aim of simplifying editing at any picture. Nevertheless 
a number of features have been included that enable editing of coded data. 
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Information technology — Coding of moving 
pictures and associated audio for digital storage 
media at up to about 1,5 Mbit/s — 

Part 2: 

Video 

Section 1: General 

1.1 Scope 

This part of ISO/IEC 1 1 172 specifies the coded representation of video for digital storage media and 
specifies the decoding process. The representation supports normal speed forward playback, as well as 
special functions such as random access, fast forward playback, fast reverse playback, normal speed reverse 
playback, pause and still pictures. This part of ISO/IEC 11172 is compatible with standard 525- and 625- 
line television formats, and it provides flexibility for use with personal computer and workstation displays. 

ISO/IEC 11172 is primarily applicable to digital storage media supporting a continuous transfer rate up to 
about 1,5 Mbit/s, such as Compact Disc, Digital Audio Tarie, and magnetic hard dislcs. Nevertheless it can 
be used more widely than this because of the generic approach taken. The storage media may be directly 
connected to the decoder, or via communications means such as busses, LANs, or telecommunications 
links. This part of ISO/EEC 1 1 172 is intended for non-interlaced video formats having approximately 288 
lines of 352 pels and picture rates around 24 Hz to 30 Hz. 

1.2 Normative references 

The following International Standards contain provisions which, through reference in this text, constitute 
provisions of this part of ISO/IEC 1 1 172. At the time of publication, the editions indicated were valid. 
All standards are subject to revision, and parties to agreements based on this part of ISO/IEC 1 1 172 are 
encouraged to investigate the possibility of applying the most recent editions of the standards indicated 
below. Members of EEC and ISO maintain registers of currently valid Internationa] Standards. 

ISO/IEC 11172-1:1993 Information technology - Coding of moving pictures and associated audio for digital 
storage media at up to about J t 5 Mbit/s - Part 1: Systems. 

ISO/IEC 1 1 172-3:1993 Information technology - Coding of moving pictures and associated audio for digital 
storage media at up to about 7,5 Mbit/s - Part 3 Audio. 

CCIR Recommendation 601-2 Encoding parameters of digital television for studios. 
CCIR Report 624-4 Characteristics of systems for monochrome and colour television. 
CCIR Recommendation 648 Recording of audio signals. 

COR Report 955-2 Sound broadcasting by satellite for portable and mobile receivers, including Annex IV 
Summary description of Advanced Digital System II 

CQTT Recommendation J.17 Pre-emphasis used on Sound-Programme Circuits. 
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IEEE Draft Standard PI 180/D2 1990 Specification for the implementation of8x 8 inverse discrete cosine 
transform . 

IEC publication 908:1987 CD Digital Audio System 
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Section 2: Technical elements 
2.1 Definitions 

For the purposes of ISO/IEC 11172, the following definitions apply. If specific to a pari this is noted in 
square brackets. 

2.1.1 ac coefficient [video): Any DCT coefficient for which the frequency in one or both dimensions 
is non-zero. 

2.13, access unit [system]: In the case of compressed audio an access unit is an audio access unit. In 
the case of compressed video an access unit is the coded representation of a picture. 

2.13 adaptive segmentation [audio]: A subdivision of the digital representation of an audio signal 
in variable segments of time. 

2.1.4 adaptive bit allocation [audio]: The assignment of bits to subbands in a time and frequency 
varying fashion according to a psychoacoustic model. 

2.1.5 adaptive noise allocation [audio]: The assignment of coding noise to frequency bands in a 
time and frequency varying fashion according to a psychoacoustic model. 

2.1.6 alias [audio]: Mirrored signal component resulting from sub-Nyquist sampling. 

2.1.7 analysis filterbank [audio]: Rlterbank in the encoder that transforms a broadband PCM audio 
signal into a set of subsampled subband samples. 

2.1.8 audio access unit [audio]: For Layers I and IT an audio access unit is defined as the smallest 
part of the encoded bitstream which can be decoded by itself, where decoded means "fully reconstructed 
sound". For Layer III an audio access unit is part of the bitstream that is decodable with the use of 
previously acquired main information. 

2.1.9 audio buffer [audio]: A buffer in the system target decoder for storage of compressed audio data 

2.1.10 audio sequence [audio]: A non-interrupted series of audio frames in which the following 
parameters are not changed: 

-ID 
-Layer 

-Sampling Frequency 

- For Layer I and II: Bitrate index 

2.1.11 backward motion vector [video]: A motion vector that is used for motion compensation 
from a reference picture at a later time in display order. 

2.1.12 Bark [audio]: Unit of critical band rate. The Bark scale is a non-linear mapping of the frequency 
scale over the audio range closely corresponding with the frequency selectivity of the human ear across the 
band. 

2.1.13 bidirectionally predictive-coded picture; B-picture [video]: A picture that is coded 
using motion compensated prediction from a past and/or future reference picture. 

2.1.14 bitrate: The rate at which the compressed bitstream is delivered from the storage medium to the 
input of a decoder. 

2.1.15 block companding [audio]: Normalizing of the digital representation of an audio signal 
within a certain time period. 

2.1.16 block [video]: An 8-rpw by 8-column orthogonal block of pels. 

2.1.17 bound [audio]: The lowest subband in which intensity stereo coding is used. 
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2.1.18 byte aligned: A bit in a coded bitstream is byte-aligned if its position is a multiple of 8-bits 
from the first bit in the stream. 

2.1.19 byte: Sequence of 8-bits. 

2.1.20 channel: A digital medium that stores or transports an ISO/IEC 11172 stream. 

2.1.21 channel [audio]: The left and right channels of a stereo signal 

2.1.22 chrominance (component) [video]: A matrix, block or single pel representing one of the 
two colour difference signals related to the primary colours in the manner defined in CCIR Rec 601 The 
symbols used for the colour difference signals are Cr and Cb. 

2.1.23 coded audio bitstream [audio]: A coded representation of an audio signal as specified in 
ISO/IEC 11172-3. 

2.1.24 coded video bitstream [video]: A coded representation of a series of one or more pictures as 
specified in this part of ISO/IEC 1 1 172. 

2.1.25 coded order [video]: The order in which the pictures are stored and decoded. This order is not 
necessarily the same as the display order. 

2.1.26 coded representation: A data element as represented in its encoded form. 

2.1.27 coding parameters [video]: The set of user-definable parameters that characterize a coded video 
bitstream. Bitstreams are characterised by coding parameters. Decoders are characterised by the bitstreams 
that they are capable of decoding. 

2.1.28 component [video]: A matrix, block or single pel from one of the three matrices auminance 
and two chrominance) that make up a picture. 

2.1.29 compression: Reduction in the number of bits used to represent an item of data. 

2.130 constant bitrate coded video [video]: A compressed video bitstream with a constant 
average bitrate. 

2.1.31 constant bitrate: Operation where the bitrate is constant from start to finish of the compressed 
bitstream. 

2.132 constrained parameters [video]: The values of the set of coding parameters defined in 

2.1.33 constrained system parameter stream (CSPS) [system]: An ISOAEC 11172 
multiplexed stream for which the constraints defined in 2.4.6 of ISO/IEC 1 1 172-1 apply. 

2.134 CRC: Cyclic redundancy code. 

2.135 critical band rate [audio]: Psychoacoustic function of frequency. At a given audible 
frequency it is proportional to the number of critical bands below that frequency. The units of the critical 
band rate scale are Barks. 

2.136 critical band [audio]: Psychoacoustic measure in the spectral domain which corresponds to the 
frequency selectivity of the human ear. This selectivity is expressed in Bark. ^ 

2.137 data element: An item of data as represented before encoding and after decoding. 

2.138 dc-coefficient [video]: The DOT coefficient for which the frequency is zero in both 
dimensions. 
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itsetr. ui tneDCT coefficients in the coded representation, only the dc-coefficients are present 

2.1.40 DCT coefficient: The amplitude of a specific cosine basis function. 

2.1.41 decoded stream: The decoded reconstruction of a compressed bitstream. 

b^ng e ^ faPU ' [Vlde ° ,: ^ *** rtfSt -° Ut buffe ' » *» video 

^£££££l ,VUe0,: 118 ^ SPedfied ta 016 WdC ° bufferin « verifi " - -coded 
2.1.44 decoder: An embodiment of a decoding process. 

»^!^^£; 0CeSS): ^ P"*** defined m IS0/IEC 11172 an input coded bitstream 

and produces decoded pictures or audio samples. ueam 

f„H;2i?l°? in8 .!! l,ne " Sta,np; ° TS ,System,: A f,e,d 1,181 mav te P"* 6 "' f a Packet header that 
indicates the time that an access unit is decoded in the system target decoder. 

^tSStt'SSff^. applied 10 8,1 audi0 signaI ^ storage or ^ smission to undo 

r^fZr^t? Mde °J : 71,6 Pr0CCSS ° f resCaling 1116 « uantized DCT coefficients after their 
representation m the brtstream has been decoded and before they are presented to the inverse DCT. 

2.1.49 digital storage media; DSM: A digital storage or. transmission device or system. 
Hflf^T C0Sine * ransform ; DCT Ivi<Ieol: Kther the forward discrete cosine transform or the 

S^r^ Norma,, 

^2, d ^ 1 0 tr nnel u™*' [aUdi ° ,: A m0de> where lwo audi0 channe,s wi * independent programme 
contents (e.g. bdmgual) are encoded within one bitstream. The coding process is the same as forZ ^ 

^™!! ng L 1 J e pmcess y which one °' m ore compressed bitstreams are manipulated to produce a 
WoScmS. COnf0nni " g ^ UtStKms must me * te ^ uireme ^ definedTthtSrt of 

c^ 4 bitetream^ ry Stream ,SyStemI: A geDeriC ^ fOT ° ne ° f ** COded video ' audi0 or other 

2.1.55 emphasis (audio): Filtering applied to an audio signal before storage or transmission to 
improve the signal-to-noise ratio at high fiequencies. transmission to 

2.1.56 encoder: An embodiment of an encoding process. 

2.1.57 encoding (process): A process, not specified in ISO/IEC 11172, that reads a stream of inn... 
pictures or audio samples and produces a valid coded bitstream as defined in ISO/DSC 1 1 172. 

n^^u2cy C0din8: Variable ,0SSle$S COdi " 8 ° f digiUU mutation of a signal to 

of nfl~s 'T7* a?**?* [ u de0,: ^ pr0CeSS ° f diSplaying a set « uence . « Parts of a sequence 
of pictures in display-order faster than real-time. ^ * 
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2.1.60 FFT: Fast Fourier Transformation. A fast algorithm for performing a discrete Fourier transform 
(an orthogonal transform). 

2.1.61 filterbank [audio]: A set of band-pass filters covering the entire audio frequency range. 

2.1.62 fixed segmentation [audio]: A subdivision of the digital representation of an audio sicnal 
into fixed segments of time. 

2.1.63 forbidden: The term "forbidden" when used in the clauses defining the coded bitstream indicates 
that the value shall never be used. This is usually to avoid emulation of start codes. 

2.1.64 forced updating [video]: Hie process by which macroblocks are intra-coded from time-to-time 
to ensure that mismatch errors between the inverse DCT processes in encoders and decoders cannot build ud 
excessively. r 

2.1.65 forward motion vector [video]: A motion vector that is used for motion compensation from 
a reference picture at an earlier time in display order. 

2.1.66 frame [audio]: A part of the audio signal that corresponds to audio PCM samples from an 
Audio Access Unit 

2 *!.*5!. freC format l aud iol: Any bitrate other than the defined bitrates that is less than the maximum 
valid bitrate for each layer. 

2.1.68 future reference picture [video]: The future reference picture is the reference picture that 
occurs at a later time than the current picture in display order. 

2.1.69 granules [Layer II] [audio]: The set of 3 consecutive subband samples from all 32 subbands 
that are considered together before quantization. They correspond to 96 PCM samples. 

2.1.70 granules [Layer US] [audio]: 576 frequency lines that carry their own side information. 

2.1.71 group of pictures [video]: A series of one or more coded pictures intended to assist random 
access. The group of pictures is one of the layers in the coding syntax defined in this part of ISO/IEC 
11172. ~ 

2.1.72 Hann window [audio]: A time function applied sample-by-sample to a block of audio samples 
before Fourier transformation. 

2.1.73 Huffman coding: A specific method for entropy coding. 

2.1.74 hybrid filterbank [audio]: A serial combination of subband filterbank and MDCT. 

2.1.75 BVfDCT [audio]: Inverse Modified Discrete Cosine Transform. 

2.1.76 intensity stereo [audio]: A method of exploiting stereo irrelevance or redundancy in 
stereophonic audio programmes based on retaining at high frequencies only the energy envelope of the right 
and left channels. 6 

2.1.77 interlace [video]: The property of conventional television pictures where alternating lines of 
the picture represent different instances in time. 

2.1.78 intra coding [video]: Coding of a macroblock or picture that uses information only from that 
macroblock or picture. 

2.1.79 intra-coded picture; I-picture [video]: A picture coded using information only from itself. 
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2.1.80 ISO/IEC 11172 (multiplexed) stream [system]: A bitstream composed of zero or more 
elementary streams combined in the manner defined in ISO/EC 1 1172.1. 

ste^K^S dIng [aUdi ° ,: ^ meU,0d *" eXp,0itS Stere °P honic ^vance or 

2.1.82 joint stereo mode [audio]: A mode of the audio coding algorithm using joint stereo coding. 

li 1 iS-? yer laUdi ° ,: ^ ° f ,eVClS h 0,6 COding hierarch y of ««* audj ° system defined in ISO/IEC 

2.1.84 layer [video and systems]: One of the levels in the data hierarchy of the video 9 „rt eve™ 
specifications defined in ISO/IEC 11172-1 and this part of ISO/EC 11172 * tem 

24.85 luminance (component) [video]: A matrix, block or single pel representing a monochrome 

sisrr^^ 

IZ^!! 0 * Md r ,: f0Ur 8 by 8 b,0Cks of ,uminanoe *«■ «* *e two coiresponding 8 by 
S^of^chrommance dam coming from a 16 by 16 section of the luminance component picW 
Macroblock is sometimes used to refer to the pel data and sometimes to the coded representation^ P S 
values and other data dements defined in the macroblock layer of the syntax defme7inrs^art ons^ 
11172. The usage is clear from the context i**jiu^ 

filSgTd^by 1 ^ C ° DVerSi0n ° f 311 aUdi ° SigDaI tmm toe * ^ domaiD b > s ^and 

tZ£tt££Jt£Z^Z^ audk0fy system by which ™ audi0 *- — * 

^^MylSt^^^^ - «- *— w »<* - audio signal 

2.1.90 MDCT [audio]: Modified Discrete Cosine Transform. 

24.91 motion compensation [video]: Tbe use of motion vectors to improve the efficiency of the 
re^nST' f ^ ^^"^onvecto*^^ 

reference pictures containing previously decoded pel values thatare used to fonn the prediction error signal. 
pioSs. m ° UOn eStimati0n lvideol: n * P rocess of estimating motion vectors during the encoding 

^J^fm D ,h? C ^ V,de0,: A . 'wo-dimensional vector used for motion compensation that provides 
an offset from the coordinate position in the current picture to the coordinates in a re^nce picture 

2.1.94 MS stereo [audio]: A method of exploiting stereo irrelevance or redundancy in stereophonic 
audio programmes based on coding the sum and difference signal mstead of tl« leSrighJSis 

f/if Q " 0 fi; tra ~;;»g IvWeo): Coding of a macroblock or picture that uses information both from 
itself and from macroblocks and pictures occurring at other times. 

2.1.96 non-tonal component [audio]: A noise-like component of an audio signal. 

2.1.97 Nyquist sampling: Sampling at or above twice the maximum bandwidth of a signal. 

2.1 1 98 pack [system]: A pack consists of a pack header followed by one or more packets. It is a layer 
in the system coding syntax described in ISO/IEC 11172-1. 

2.1.99 packet data [system]: Contiguous bytes of data from an elementary stream present in a packet 
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2.1.100 packet header [system]: The data structure used to convey information about the elementary 
stream data contained in the packet data. 

2.1.101 packet [system]: A packet consists of a header followed by a number of contiguous bytes 
from an elementary data stream. It is a layer in the system coding syntax described in ISO/IEC 11172-1. 

2.1.102 padding [audio]: A method to adjust the average length in time of an audio fiame to the 
duration of the corresponding PCM samples, by conditionally adding a slot to the audio frame. 

2.1.103 past reference picture [video]: The past reference picture is the reference picture that occurs 
at an earlier time than the current picture in display order. 

2.1.104 pel aspect ratio [video]: The ratio of the nominal vertical height of pel on the display to its 
nominal horizontal width. 

2.1.105 pel [video]: Picture element 

2.1.106 picture period [video]: The reciprocal of the picture rate. 

2J.107 picture rate [video]: The nominal rate at which pictures should be output from the decoding 
pipcess. 

2.1.108 picture [video]: Source, coded or reconstructed image data. A source or reconstructed picture 
consists of three rectangular matrices of 8-bit numbers representing the luminance and two chrominance 
signals. The Picture layer is one of the layers in the coding syntax defined in this part of ISO/IEC 11172. 
Note that the term "picture" is always used in ISO/IEC 11172 in preference to the terms field or fame. 

2.1.109 polyphase filterbank [audio]: A set of equal bandwidth filters with special phase 
interrelationships, allowing for an efficient implementation of the filterbank. 

2.1.110 prediction [video]: The use of a predictor to provide an estimate of the pel value or data 
element currently being decoded. 

2.1.111 predictive-coded picture; P-picture [video]: A picture that is coded using motion 
compensated prediction from the past reference picture. 

2.1.112 prediction error [video]: The difference between the actual value of a pel or data element and 
its predictor. 

2.1.113 predictor [video]: A linear combination of previously decoded pel values or data elements. 

2.1.114 presentation time-stamp; PTS [system]: A field that may be present in a packet header 
that indicates the time that a presentation unit is presented in the system target decoder. 

2.1.115 presentation unit; PI) [system]: A decoded audio access unit or a decoded picture. 

2.1.116 psychoacoustic model [audio]: A mathematical model of the masking behaviour of the 
human auditory system. 

2.1.117 quantization matrix [video]: A set of sixty-four 8-bit values used by the dequantizer. 

2.1.118 quantized DCT coefficients [video]: DCT coefficients before dequantization. A variable 
length coded representation of quantized DCT coefficients is stored as part of the compressed video 
bitstream. 

2.1.119 quantizer scalefactor [video]: A data element represented in the bitstream and used by the 
decoding process to scale me dequantization. 
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print ° rand ° m aCCeSS ' ^ Pr0CCSS ° f be8innin « 10 read and decode the coded bitsueam at an arbitraiy 

2.1.121 reference picture (vldeoj: Reference pictures are the nearest adjacent I- or P-pictures to the 
current picture in display order. v 

2.1.122 reorder buffer (video]: A buffer in the system target decoder for storage of a reconstructed I- 
prcture or a reconstructed P-picture. 

qu^u^ e vSS faati ° n [aUdi ° 1: DeCOdi " g ° f COded Subband samp,es m ortler 10 reaver the original 

fnVi 4 r ^ erved: ™ e tenn "reserved" when used in the clauses defining the coded bitstream indicates 
that the value may be used in the future for ISO/EC defined extensions. 

db^orieT" ******* [vld '° ,: ^ process of dis P ,avin 8 sequence in the reverse of 

Mafefertw" 1 ''* 0 * 0 ' band [aUdi ° ,: A 861 ° f frequencv lines ln ^ m wnich are scaled by one 

2.1.127 scalefactor index [audio]: A numerical code for a scalefactor. 

2.1.128 scalefactor (audio): Factor by which a set of values is scaled before quantization. 

2.1.129 sequence header (videol: A block of data in the coded bitstream containing the coded 
representation of a number of data elements. 

2.1.130 side information: Information in the bitstream necessary for controlling the decoder. 

2.1.131 skipped macrobtock [video]: A macroblock for which no data are stored. 

tii" 1 pmonSO^C 1 ill72 5rieS ° f maCrObl0CkS - 11 is one of ,ayers of "» codta « svnto *Bi*i in 

toll^andm^yS 1 18 " elemCntary m m WmeSm - In LayCr 1 a S,0t e( » ua]s 

2.1.134 source stream: A single non-multiplexed stream of samples before compression coding. 

2.1.135 spreading function [audio]: A function that describes the frequency spread of masking. 

2.1.136 start codes (system and video]: 32-bit codes embedded in that coded bitstream that are 
unique. They are used for several purposes including identifying some of the layers in the coding syntax. 

2.1.137 STD input buffer (system]: A first-in first-out buffer at the input of the system target 
decoder for storage of compressed data from elementary streams before decoding. 

2.1.138 stereo mode [audio]: Mode, where two audio channels which form a stereo pair (left and 
right) are encoded within one bitstream. The coding process is the same as for the dual channel mode. 

2.1.139 stuffing (bits); stuffing (bytes) : Code-words that may be inserted into the compressed 
bitstream that are discarded in the decoding process. Their purpose is to increase the bitrate of the stream. 

2.1.140 subband (audio]: Subdivision of the audio frequency band. 

S^^?!F^ , !S?2 A S6t ° f b3nd fi,terS COvering ^ ^ »udio frequency range. 
In ISO/IEC 1 1172-3 the subband filterbank is a polyphase interbank. 
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2.1 1.142 subband samples [audio]: The subband filterbank within the audio encoder creates a filtered 

reputation of the input audio stream. The filtered samples are called subband samples 
LTof^subS 6 WPUt Samp,eS ' 12 time ^ 0nsecutive subband **»P*s ^ generated^ 

24J43 syncword [audio]: A 12-bit code embedded in the audio bitstream that identifies the start of a 

^tiSSSUSSS" laudio,: F,lterbank m ** decoder a PCM audio 

2.1.145 system header [system]: The system header is a data structure defined in ISO/TEC 11172-1 
tnat carries information summarising the system characteristics of the ISO/TEC 1 1 172 multiplexed stream. 

2.1.146 system target decoder; STD [system]: A hypothetical reference model of a decoding 
process used to describe the semantics of an ISO/IEC 1 1 172 multiplexed bitstream. 

2.1.147 time-stamp [system]: A term that indicates the time of an event 

tHUSt? ^HSK Asetot3 ""Kecunve subband samples from one subband. A triplet from 
each of the 32 subbands forms a granule. 

2.1.149 tonal component [audio]: A sinusoid-like component of an audio signal. 

2.1.150 variable bitrate: Operation where the bitrate varies with time during the decodine of a 
compressed bitstream. 

2 'H Sl ? fiab,e lengtb COdlng; VLC: A rcversi °te procedure for coding that assigns shorter code- 
words to frequent events and longer code-words to less frequent events. 

2.1.152 video buffering verifier; VBV [video]: A hypothetical decoder that is conceptually 
connected to tie output of the encoder. Its purpose is to provide a constraint on the variability of the data 
rale that an encoder or editing process may produce. 

2.1.153 video sequence [video]: A series of one or more groups of pictures. It is one of the layers of 
the codmg syntax defined in this part of ISO/IEC 11172. 

2.1.154 zig-zag scanning order [video]: A specific sequential ordering of the DCT coefficients from 
(approximately) die lowest spatial frequency to the highest. 
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2.2 Symbols and abbreviations 

The mathematical operators used to describe this International Standard are similar to those used in the C 
^gramming language. However, integer division with truncation and rounding are specifically defined. 
Hie bitwise operators are defined assuming twos-complement representation of integerT Numbering and 
counting loops generally begin from zero. g 3110 

2.2.1 Arithmetic operators 

+ Addition. 

Subtraction (as a binary operator) or negation (as a unary operator). 
++ Increment 

• - • Decrement 

* Multiplication. 
A Power. 

/ Integer division with truncation of the result toward zero. For example, 7/4 and -7/-4 are 

truncated to 1 and -7/4 and 7/-4 are truncated to -1. 

Integer division with rounding to the nearest integer. Half-integer values are rounded away 
from zero unless otherwise specified. For example 3//2 is rounded to 2, and -3//2 is rounded 



// 



DIV Integer division with truncation of the result towards -*>. 

I I Absolute value. Ixl = xwhenx>0 

1x1 = 0 when x = 0 
I x I = -x when x < 0 

% Modulus operator. Defined only for positive numbers. 

Sign( ) Sign(x) =1 x >0 
0 x = 0 
-1 x <0 

NINT ( ) Nearest integer operator. Returns the nearest integer value to the real-valued argument Half- 
integer values are rounded away from zero. 

sin Sine. 

cos Cosine. 

exp Exponential. 

V Square root 

log 10 Logarithm to base ten. 

logc Logarithm to base e. 

log2 Logarithm to base 2. 

2.2.2 Logical operators 

II Logical OR. 
&& Logical AND. 
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! Logical NOT. 

2.2.3 Relational operators 

> Greater than. 

>= Greater than or equal to. 

< Less than. 

<= Less than or equal to. 

== Equal to. 

*= Not equal to. 

max [,...,] the maximum value in the argument list 
min [,...,] the minimum value in the argument list 

2.2.4 Bitwise operators 

A twos complement number representation is assumed where the bitwise operators are used. 
& AND. 
I OR. 

» Shift right with sign extension. 

« Shift left with zero fill. 

2.2.5 Assignment 

= Assignment operator. 

2.2.6 Mnemonics 

The following mnemonics are defined to describe the different data types used in the coded bit-stream. 

telbf Bit string, left bit first, where "left" is the order in which bit strings are written in 

ISO/EEC 11172. Bit strings are written as a string of Is and Os within single quote 
marks, e.g. '1000 0001*. Blanks within a bit string are for ease of reading and have no 
significance. 

ch Channel. If ch has the value 0, the left channel of a stereo signal or the first of two 

independent signals is indicated. (Audio) 



nch 



main_.datau.beg 
part2Jength 



Number of channels; equal to 1 for single_channel mode, 2 in other modes. (Audio) 

Granule of 3 * 32 subband samples in audio Layer II, 18 * 32 sub-band samples in 
audio Layer m. (Audio) 

Tlie main_data portion of the bitstream contains the scalefactors, Huffman encoded 
data, and ancillary information. (Audio) 

The location in the bitstream of the beginning of the main_data for the frame. The 
location is equal to the ending location of the previous frame's main_data plus one bit. 
It is calculated from the mamjlata_end value of the previous frame. (Audio) 

The number of main_data bits used for scalefactors. (Audio) 
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rpchof 
sb 

sblimit 

scfsi 

switch_pointJ 

switch_jx>int^s 

uimsbf 
vlclbf 

window 



Remainder polynomial coefficients, highest order first (Audio) 
Subband. (Audio) 

The number of the lowest sub-band for which no bits are allocated. (Audio) 
Scalefactor selection information. (Audio) 

Number of scalefactor band (long block scalefactor band) from which point on window 
switching is used. (Audio) 

Number of scalefactor band (short block scalefactor band) from which point on window 
switching is used. (Audio) 

Unsigned integer, most significant bit first 

Variable length code, left bit first, where "left" refers to the order in which the VLC 
codes are written. 



Number of the actual time slot in case of block_type=2, 0 £ window < 2. (Audio) 
The byte order of multi-byte words is most significant byte first 
2.2.7 Constants 

it 3,14159265358... 
e 2,71828182845... 



2.3 Method of describing bitstream syntax 

The bitstream retrieved by the decoder is described in 2.4.2. Each data item in the bitstream is in bold type. 
It is described by its name, its length in bits, and a mnemonic for its type and order of transmission. 

The action caused by a decoded data element in a bitstream dependsonthe value of that data element and 
on data elements previously decoded. The decoding of the data elements and definition of the state variables 
used in their decoding are described in 2.4.3. The following constructs are used to express the conditions 
whea data elements are present, and are in normal type: 

Note this syntax uses the XT-code convention that a variable or expression evaluating to a non-zero value is 
equivalent to a condition that is true. 



while (condition) { 
data.element 

} 

do{ 

data_element 

} while ( condition ) 

if (condition) { 
data_e)ei»ent 

• . •• 

} 

else { 

data_element 



If the condition is true, then the group of data elements occurs next 
in the data stream. This repeats until the condition is not true. 



The data element always occurs at least once. 

Tbe data element is repeated until the condition is not true. 

If the condition is true, then the first group of data elements occure 
next in the data stream. 



If the condition is not true, then the second group of data elements 
occurs next in the data stream. 
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for (exprl; expr2; expr3) { exprl is an expression specifying the initialization of the loop. Normally it 
da ta_element specifies the initial state of the counter. expr2 is a condition specifying a test 
' • • tefore each iteration of the loop. The loop terminates when the condition 

J is not true. expr3 is an expression that is performed at the end of each iteration 

of the loop, normally it increments a counter. 

Note that the most common usage of this construct is as follows: 

for ( i = 0;i<n;i++) ( The group of data elements occurs n times. Conditional constructs 
data_element within the group of data elements may depend on the value of the 
• * * loop control variable i, which is set to zero for the first occurrence, 

) incremented to one for the second occurrence, and so forth. 

As noted, the group of data elements may contain nested conditional constructs. For compactness the f ) 
may be omitted when only one data element follows. ^ * u 

data_element [] dat^_element Q is an array of data. The number of data elements is indicated bv 
the context 

data.element [n] dato_element [n] is the n+lth element of an array of data. 

data_element |m](n] data_element [m][n] is them+l,n+l th element of a two-dimensional array of 



data.element [l][m][nj data_element n][m][n] is the I+l,m+l,n+l th element of a three-dimensional 
array of data. 

data_element [m..n] is the inclusive range of bits between bit m and bit n in the datajilement 

While the syntax is expressed in procedural terms, it should not be assumed that 2.43 implements a 
sansfactory decoding procedure. In particular, it defines a correct and error-free input bitstream Actual 
decoders must include a means to look for start codes in order to begin decoding correctly, and to identify 
S^^nm^ST WhUe deCOding * 1116 methods 10 identif y uese situations, and the actions to be 

Definition of bytealigned function 

^function bytealigned 0 returns 1 if the current position is on a byte boundary, that is the next bit in the 
bitstream is the first bit in a byte. Otherwise it returns 0. 

Definition of nextbits function 

TTie function nextbits 0 permits comparison of a bit string with the next bits to be decoded in the 
bitstream. 

Definition of next_start_code function 

Hie nexLstarLcode function removes any zero bit and zero byte stuffing and locates the next start code. 



Syntax 


No. of bits 


Mnemonic 


nextjstaruxxteO { 
while ( IbytealignedO ) 
zero_bit 

while ( nextbitsO != WOO 0000 0000 0000 0000 0001' ) 
zero_byte 


1 
8 


"00000000" 



This traction checks whether the current position is bytealigned. If it is not, zero sniffing bits are present 
After thai any number of zero bytes may be present before the start-code. Therefore start-codes are always 
bytealigned and may be preceded by any number of zero stuffing bits. 
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2.4 Requirements 

2.4.1 Coding structure and parameters 

Video sequence 

A coded video sequence commences with a sequence header and is followed by one of more groups of 
pictures and is ended by a sequence_end_code. Immediately before each of the groups of pictures there may 
be a sequence header; Within each sequence, pictures shall be decodable continuously. 

In each of these repeated sequence headers all of the data elements with the permitted exception of those 
defining the quantization matrices (load_intra__quantizer ^matrix, load_non Jntra^quantizer_matrix and 
optionally intra_quantizer_matrix and non_mtra_quanuzer_matrix) shall have the same values as in the first 
sequence header. The quantization matrices may be redefined each time that a sequence header occurs in the 
bitstream. Thus the data elements Ioad_intra_quantizer_matrix, load_non_mtra_quantizer_matrix and 
optionally uitra_quantizer_matrix and nonJntra_quanuzer_matrix may have any (non-forbidden) values. 

Repeating the sequence header allows the data elements of the initial sequence header to be repeated in order 
that random access into the video sequence is possible. In addition the quantization matrices may be 
changed inside the video sequence as required. 

Sequence header 

A video sequence header commences with a sequence_header_code and is followed by a series of data 
elements. 

Group of pictures 

A group of pictures is a series of one or more coded pictures intended to assist random access into the 
sequence. In the stored bitstream, the first coded picture in a group of pictures is an I-Picture. The order of 
the pictures in the coded stream is the order in which the decoder processes them in normal playback. In 
particular, adjacent B-Pictures in the coded stream are in display order. The last coded picture, in display 
order, of a group of pictures is either an I-Picture or a P-Picture. 

The following is an example of groups of pictures taken from the beginning of a video sequence. In this 
example the first group of pictures contains seven pictures and subsequent groups of pictures contain nine 
pictures. There are two B -pictures between successive P-pictures and also two B -pictures between 
successive I- and P-pictures. Picture '11* is used to form a prediction for picture *4P\ Pictures ^P" and ir 
are both used to form predictions for pictures *2B' and 3B\ Therefore the order of pictures in the coded 
sequence shall be 'IF, *4F, *2B\ 3B\ However, the decoder should display them in the order '1I\ TB', *3B\ 
'4P\ 

At the encoder input, 

III 2 3 4 5 6 7 II 8 9 10 11 12 13 14 15 16H 17 18 19 20 21 22 23 24 25 
U I B B P B B P[|B B I B B P B B P [j B B I B B P B B P 

At the encoder output, in the stored bitstream, and at the decoder input, 

111 4 2 3 7 5 6 I] 10 8 9 13 11 12 16 14 15|J 19 17 18 22 20 21 25 23 24 

U x pbbpbbUi bbpbbpbb|]i bbpbbpbb 

where the double vertical bars mark the group of pictures boundaries. Note that in mis example, the first 
group of pictures is two pictures shorter than subsequent groups of pictures, since at the beginning of video 
coding there are no B-pictures preceding the first I-Picture. However, in general, in display order, there may 
be B-Pictures preceding the first I-Picture in the group of pictures, even for the first group of pictures to be 

uCOOuQQL 



15 



Exhibit 18, page 85 



ISO/IEC 11172-2: 1993(E) 



©ISO/IEC 



At the decoder output, 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 

A group of pictures may be of any length. A group of pictures shall contain one or more I-Pictures 
Applications requiring random access, fast-forward playback, or fast and normal reverse playback may use 
relatively short groups of pictures. Groups of pictures may also be started at scene cuts or other cases 
where motion compensation is ineffective. 

The number of consecutive B-Pictures is variable. Neither B- nor P-Pictures need be present 

A video sequence of groups of pictures that is read by the decoder may be different from the one at the 
encoder output due to editing. 

Picture 

A sourceor reconstructed picture consists of three rectangular matrices of eight-bit numbers; a luminance 
matrix (Y), and two chrominance matrices (Cb and Cr). The Y-matrix shall have an even number of rows 
and columns, and the Cb and Cr matrices shall be one half the size of the Y-matrix in both horizontal and 
vertical dimensions. 

The Y, Cb and Cr components are related to the primary (analogue) Red, Green and Blue Signals (F , E 
and F fi ) as described in COR Recommendation 601. These primary signals are gamma pre-corrected. The 
assumed value of gamma is not defined in this part of ISO/EC 1 1 172 but may typically be in the region 
approximately 2,2 to approximately 2,8. Applications which require accurate colour reproduction may 
11172 10 SPCd£y ^ 7311,6 ° f 8amma m0rc accuratdy » but is outside toe scope of this part of ISO/IEC 

The luminance and chrominance sarhples are positioned as shown in figure 5, where "x" marks the position 
of the luminance (Y) samples and "0" marks the position of the chrominance (Cb and Cr) samples: 
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Figure 5 - The position of luminance and chrominance samples. 

There are four types of coded picture that use different coding methods. 

An Intra-coded picture (I-picture) is coded using information only from itself. 

A Predictive-coded picture (P-pkture) is a picture which is coded using motion compensated 
predicuonfromapastl-PictureorP-Picture. 

A Bidirectional^ predictive-coded picture (B-picture) is a picture which is coded using motion 
compensated prediction from a past and/or future I-Picture or P-Picture. 

A dc coded (D) picture is coded using information only from itself. Of the DCT coefficients only the 
dc ones are present The D-Pictures shall not be in a sequence containing any other picture types. 
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Slice 

A slice is a series of an arbitrary number of macroblocks with the order of macroblocks starting from the 
upper-left of the picture and proceeding by raster-scan order from left to right and top to bottom. The first 
and last macroblocks of a slice shall not be skipped macroblocks (see 2.4.4.4). Every slice shall contain at 
least one macroblock. Slices shall not overlap and there shall be no gaps between slices. The position of 
shces may change from picture to picture. The first slice shall start with the first macroblock in the picture 
and the last slice shall end with the last macroblock in the picture. 

Macroblock 

A macroblock contains a 16-pel by 16-line section of luminance component and the spatially corresponding 8-pel by 
8-line section of each chrominance component A macroblock has 4 luminance blocks and 2 chrominance blocks The 
term "macroblock" can refer to source or reconstructed data or to scaled, quantized coefficients. The order of blocks in a 
macroblock is top-left, top-right, bottom-left, bottom-right blocks for Y, followed by Cb and Cr. Figure 6 shows the 
arrangement of these blocks. A skipped macroblock is one for which no information is stored (see 2.4.4.4). 

CD 



Y Cb Cr 

Figure 6 The arrangement of blocks in a macroblock. 

Block 

A block is an orthogonal 8-pel by 8-line section of a luminance or chrominance component 

The term "block" can refer either to source and reconstructed data or to the corresponding coded data 
elements. 

Reserved, Forbidden and Marker bit 

The terms "reserved" and "forbidden" arc used in the description of some values of several fields in the coded 
bitstream. 

The torn "reserved" indicates that the value may be used in the future for ISO/IEC-defined extensions. 
Tbe term "forbidden" indicates a value that shall never be used (usually in order to avoid emulation of start 



The term "marker Jit" indicates a one bit field in which the value zero is forbidden. These marker bits are 
introduced at several points in the syntax to avoid start-code emulation. 



0 


1 


2 


3 
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2.4.2 Specification of the coded video bitstream syntax 
2.4.2.1 Start codes 

Start codes are reserved bit patterns that do not otherwise occur in the video stream. All start codes 
bytealigned. 









AAAAA1 f\f\ 

00000100 


shce_start_codes (including slice_vertical_positions) 


00000101 




through 


reserved 


000001AF 


000001BO 


reserved 


000001B1 


user_data.starL.code 


000001B2 


sequenceJieader_code 


000001B3 


sequencejenrjrjoode 


000001B4 


extension_start_code 


000001B5 


reserved 


000001B6 


sequerx»_end_code 


000001B7 


group_start_code 


000001 B8 


system start codes (see note) 


000001B9 




through 




000001FF 


iNUiii system start codes are defined in ISO/IEC 11172-1. 



The use of the start codes is defined in the following syntax description with the exception of the 
sequence.error.code. The sequence.error.code has been allocated for use by the digital storage media 
interface to indicate where uncorrectable errors have been detected. 



2.4.2.2 Video sequence layer 



Syntax 


No. of bits 


Mnemonic 


videcLsequenceO { 

nexLstarLcodeO 
do{ 

sequence.headerO 
do{ 

group_of_pi cturesO 
) while ( nextbitsO = group_start^code ) 
} while ( nextbitsO = sequence_header_code ) 
sequence_end code 

> 


32 


bslbf 
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2.4.2.3 



Sequence header 



Syntax 

sequenceJieadeiO { 

sequence_header_code 

horizontaI_size 

vertical.size 

pel_aspect_ratio 

plcture_rate 

b iterate 

marker_bit 

vbv_buffer_size 

constrained_paratneters_flag 

Ioad_intra_quantizer_matrix 

if ( load Jntra^quantizerjnatrix ) 

intra_quantizer_matrix fj 
JoadJionJntra_quantizer_matrix 
if ( load.non^intfa^quantizerjnatrix ) 

non_intra_quantizer_matrix [J 
nextjstarCcodeO 

if (nextbitsO = extension_siart^code ) { 
extension__start_code 

while ( nextbits 0 != '0000 0000 0000 0000 0000 0001' ) { 
^ sequence_extension_data 

^ nexLstaiuxxleO 

if (nextbitsO = user_data_starLcode ) { 
user_data_start_code 

while ( nextbitsO != '0000 0000 0000 0000 0000 0001' ) { 
user_data 

) 



32 
12 
12 



1 
1 

8*6 

8*64 

32 
8 

32 
8 



next_start_codeO 



bslbf 

uimsbf 

uimsbf 

uimsbf 

uimsbf 

uimsbf 

HJtl 

uimsbf 



uimsbf 
uimsbf 
bslbf 



bslbf 
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2.4.2.4 



Group of pictures layer 



Syntax 



No. of bits 



Mnemonic 



gn)up_of_pictuxe$0 { 



group_start_code 
time_code 
c!osed_gop 
brokehjink 



32 
25 
1 
1 



bslbf 



nexLstan_codeO 

if ( nextbitsO = extension_starL.code ) { 
extension_start_code 

while (nextbitsO != '0000 0000 0000 0000 0000 0001' ) { 
group_extension_data 

} 

nexL$tart_code0 

} 

if ( nextbitsO = user_datZL.start.code ) { 
user_data_start_code 

while ( nextbitsO != '0000 0000 0000 0000 0000 000 1 1 ) { 
user_data 

} 

nexustart_code0 

) 

do{ 

pictureO 

} while ( nextbitsO — picture_stait_code ) 



20 



Exhibit 18, page 90 



<g> ISO/1EC 



ISO/IEC 11172-2: 1993 (E) 



2.4.2.5 



Picture layer 



Syntax 
pictureO { 



No. of bits 



Mnemonic 



picture_start_code 
tempora preference 
picture_coding_type 
vbv^delay 

if ( (picture_coding_type = 2) II (picture_coding_type = 3) ) { 
full_pel_forward_vector 
forward_f_code 

} 

if ( picture_coding_type = 3 ) { 
full_peI_backward_vector 
backward_f_code 

) 

while ( nextbitsO — T ) { 
extra_bit_picture 
extra_information_picture 

extra_bit_picture 

nexL_start_codeO 

if (nextbitsO = extension_starL.code ) { 
extension_start_code 

while (nextbitsO != '0000 0000 0000 0000 0000 0001* ) { 
picture_extension_data 

) 

^ next_stait_codeO 

if ( nextbitsO = user_data_stan_code ) { 
user_data_start__code 

while ( nextbitsO !=*0000 0000 0000 0000 0000 0001* ) { 
user_data 

} 

nextjstarUcodeO 

do{ 

sliceO 

} while ( nextbitsO = shce_starLcode ) 



32 
10 
3 
16 

1 

3 



1 

3 



1 

8 



32 
8 

32 
8 



bslbf 
uimsbf 
uimsbf 
uimsbf 



uimsbf 
uimsbf 

bslbf 



bslbf 
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Syntax 



No. of bits 



Mnemonic 



sliceO { 

slice_start_code 
quantizer_scale 
while ( nextbitsO = T ) { 

extra_bit_slice 

extra Jnformation_sIice 

) 

extra_blt_slice 
do{ 

maooblockO 

} while ( nextbitsO != '000 0000 0000 0000 0000 0000' ) 
nexLstaiUxxleO 

) 



32 
5 

1 

8 



bslbf 
uimsbf 



2.4.2.7 



Macroblock layer 



Syntax 



No. of bits Mnemonic 



macroblock 0{ 

while ( nextbitsO = WOO 0001 111 1 ) 

macroblock_stuffing 
while ( nextbitsO = '0000 0001 000' ) 

macroblock_escape 
macroblock_address_increment 
macroblock_type 
if ( macroblock_quant ) 

quantizer_scale 
if(inacroblock_motion_forward) { 

motion_horizontaI_forward_code 

if((forward_f !=!)&& 

(motion_horizontal_forward_code != 0) ) 
motion__horizontal_forward__r 

motion_vertical_forward_code 

if ( (forward./ != 1)&& 

(motion_vertical_forward_code != 0) ) 
motion_vertical forward_r 

} 

if ( macroblock_motionj>ackward ) { 

motion_horizontaLbackward_code 
if ((backward J !=1)&& 

(motion Jiorizontal^backward^code != 0) ) 
motion_horizontal_backward_r 
motion_vertical_backward_code 
if ( (backward./ != 1)&& 

(motion_vertical_backward_code != 0) ) 
motion_vertical_backward_r 

} 

if ( macroblock_pattem) 

coded_block_pattern 
for ( i=0; i<6; i++ ) 

bIock( i ) 
if ( pictwe_coding_type = 4 ) 

end_of_macroblock 



11 

11 
1-11 
1-6 

5 

l-ii 



1-6 

1-11 



1-6 
1-11 



1-6 
1-11 



1-6 
3-9 



vlclbf 

vlclbf 
vlclbf 
vlclbf 

uimsbf 

vlclbf 



uimsbf 
vlclbf 



uimsbf 
vlclbf 



uimsbf 
vlclbf 



uimsbf 
vlclbf 
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2 4 2 8 Block lavor 






Syntax 


No. of bits 


Mnemonic 


block(i) { 






if ( pattem_code[i] ) { 

if ( macroblock intra) { 
if(k4){ 

dct_dc_size_Iuminance 
if(dc_sizejuminance != 0) 
dct_dc_differential 

) 

else { 

dct__dc_size_chromi nance 
if(dcjsize_chrominance !=0) 
dct_dc_differential 

} 


2-7 
1-8 


vlclbf 
uimsbf 


2-8 
1-8 


vlclbf 
uimsbf 


} 

else { 

dct_coeff_first 

} 

if ( picture_coding_type != 4 ) { 
while ( nextbitsO != 'lOO 

dct_coeff_next 
end_of_b!ock 

} 

) 

} 


2-28 


vlclbf 


3-28 
2 


vlclbf 
vlclbf 

i. 
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2.4.3 Semantics for the video bitstream syntax 

2.4.3.1 Video sequence layer 

sequence_encLcode - The sequence_end_code is the bit string 000001B7 in hexadecimal. It terminates a 
video sequence. 

2.4.3.2 Sequence header 

sequence Jieader_code - The sequence_header_code is the bit string 000001B3 in hexadecimal. It 
identifies the beginning of a sequence header. 

horizontal_size ~ The horizontaLsize is the width of the displayable part of the luminance component 
in pels. The width of the encoded luminance component in macrobiotics, mb_width, is 
(horizontaLsize* 15)/16. The displayable part of the picture is left-aligned in the encoded picture. 

vertical_size ~ The verticaLsize is the height of the displayable part of the luminance component in 
pels. The height of the encoded luminance component in macroblocks, mbjieight, is 
(vertical_size+ 15)/16. The displayable part of the picture is top-aligned in the encoded picture. 

pel_aspect jratio - This is a four-bit integer defined in the following table. 



pel_aspect_ratio 


height/width 


example 


0000 


forbidden 




0001 


1,0000 


VGA etc. 


0010 


0,6735 




0011 


0,7031 


16:9, 6251ine 


0100 


0,7615 


0101 


0,8055 




0110 


0,8437 


16:9, 5251ine 


0111 


0,8935 




1000 


0,9157 


CCIR601, 6251ine 


1001 


0,9815 


1010 


1,0255 




1011 


1,0695 




1100 


1,0950 


CCIR601, 5251ine 


1101 


1,1575 


1110 


1,2015 




1111 


reserved 





picture_rate - This is a four-bit integer defined in the following table. 



picturejate 


pictures per second 


0000 


fortridden 


0001 


23,976 


0010 


24 


0011 


25 


0100 


29,97 


0101 


30 


0110 


50 


0111 


59.94 


1000 


60 




reserved 


iiii 


reserved 



Applications and encoders should take into account the fact that 23,976, 29,97 and 59,94 are not exact 
representations of the nominal picture rate. The exact values are found from 24 000/1 001, 30 000/1 001, 
and 60 000/1 001 and can be derived from CCIR Report 624-4. 
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bit_rate -This is an integer specifying the bitrate of the bitstream measured in units of 400 bits/* 

Sb"?ropSr e ^ isforbidden - ^^^FFFFoi „„ iiSssStTA 

marker J>it - This is one bit that shall be set to T. 

B = 16* 1 024* vbv_buffer_size 
where B is the minimum VBV buffer size in bits required to decode the sequence (see annex Q. 
SS^SSS^SL^ 3 ~* ^ ^ * *<» * *• lowing da* 

horizontaLsize <= 768 pels, 
verticaLsize <= 576 pels, 

(*oiizontal^sizw-15)/16) *((vertical^size+15)/16) <= 396, 
(0ionzontal.size+15)/16) *((vertical_size+15)/16))*picture rate <= 396*25 
picture_rate <= 30 pictures/s. 
forvvard_f_code <= 4 (see 2.4.3.4) 
backwaidJLcode <= 4 (see 2.43.4) 

Innldf 'i^^V"' ttb l ~ This is a one-bit flag which is set to "1" if an intra_quanu2er_matr« 



8 

16 
19 

22 
22 
26 
26 
27 



16 
16 

22 
22 
26 
27 
27 
29 



19 
22 
26 
26 
27 
29 
29 
35 



22 
24 
27 
27 
29 
32 
34 
38 



26 
27 
29 
29 
32 
35 
38 
46 



27 
29 
34 
34 
35 
40 
46 
56 



29 
34 
34 
37 
40 
48 
56 
69 



34 

37 

38 

40 

48, 

58 

69 

83 



intra_quanti«r_inatrix - This is a list of sixty-four 8-bit unsigned integers. The new values stored i. 
Jei zigzag scanning order shown in 2.4.4.1. replace the default values shown above. The va^ue Sro k 
tS^SSSSSS^***™ ^-wvaluesshallbeineffectS^e 

load_non_lntra_quantizer_matrix - This is a one-bit flag which is set to "1" if a 



in 



16 
16 
16 
16 
16 
16 
16 
16 



16 
16 
16 
16 
16 
16 
16 
16 



16 
16 
16 
16 
16 
16 
16 
16 



16 
16 
16 
16 
16 
16 
16 
16 



16 
16 
16 
16 
16 
16 
16 
16 



16 
16 
16 
16 
16 
16 
16 
16 



16 
16 
16 
16 
16 
16 
16 
16 



16 
16 
16 
16 
16 
16 
16 
16 
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non_intra_quantizer_matrix - This is a list of sixty-four 8-bit unsigned integers. The new values, 
stored in the zigzag scanning order shown in 2.4.4.1, replace the default values shown above. The value* 
zero is forbiddea The new values shall be in effect until the next occurence of a sequence header. 

extensionjstart.code - The extension_start_code is the bit string 000001B5 in hexadecimal. It 
identifies the beginning of extension data. The extension data continue until receipt of another start code 
It is a requirement to parse extension data correctly. 

sequence_extension_data Reserved. 

user_data_start_code - The user_datajaart_code is the bit string 000001 B2 in hexadecimal It 
identifies the beginning of user data. The user data continues until receipt of another start code. 

user.data - The user_data is defined by the users for their specific applications. The user data shall not 
contain a string of 23 or more zero bits. 

2.4.3.3 Group of pictures layer 

group_$tart_code - The group_start_code is the bit string 000001B8 in hexadecimal. It identifies the 
beginning of a group of pictures. 

time_code - This is a 25-bit field containing the following: drop Jramejlag, tinie_code_hours, 
time_code_minutes, markerjrit, time_code_seconds and time^code_pictures. The fields correspond to the 
fields defined in the IEC standard (Publication 461) for "time and control codes for video tape recorders" (see 
annex E). The code refers to the first picture in the group of pictures that has a temporal ^reference of zero 
The drop_frame_flag can be set to either "0" or "1". It may be set to "1" only if the picture fate is 
29,97Hz. If it is "0" then pictures are counted assuming rounding to the nearest integral number of pictures 
per second, for example 29,97 Hz would be rounded to and counted as 30 Hz. If it is "1" then picture 
numbers 0 and 1 at the start of each minute, except minutes 0, 10, 20, 30, 40, 50 are omitted from the 
count 



time_code 


range of value" • 


bits 


drop Jramejlag 
thne_code_hours 
' time_code_minutes 
marker J)it 
time_code_seconds 
time_code_j>ictures 


0-23 
0-59 
1 

0-59 
0-59 


1 

5 uimsbf 

6 uimsbf 
1 "1" 

6 uimsbf 
6 • uimsbf 



closed_gop - This is a one-bit flag which may be set to "1" if the group of pictures has been encoded 
without motion vectors pointing to the previous group of pictures. 

This bit is provided for use during any editing which occurs after encoding. If the previous group of pictures 
is removed by editing, broken Jink may be set to "1" so that a decoder may avoid displaying the B- 
Pictures immediately following the first I-Picture of the group of pictures. However if the c!osed_gop 
bit indicates that there are no prediction references to the previous group of pictures thai the editor may 
choose not to set the broken Jink bit as these B-Pictures can be correctly decoded in this case. 

brokenjink - This is a one-bit flag which shall be set to "0" during encoding. It is set to "1" to indicate 
that the B-Pictures immediately following the first I-Picture of a group of pictures cannot be correctly 
decoded because the other I-Picture or P-Picture which is used for prediction is not available (because of the 
action of editing). 

A decoder may use this flag to avoid displaying pictures that cannot be correctly decoded. 
extension_start_code - See 2.4.32. 
group_extension_data -- Reserved. 
user_data_start_code - See 2.4.3.2. 
user_data - See 2.4.3.2. 
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2.4.3.4 Picture layer 

teSStctoS"" 00 * 1 * ~ '^ epicture - slar ^ codcisas,rin S of 32-bits having the value 00000100 in 

temporal reference - The temporaljeference is a 10-bit unsigned integer associated with each innut 

S n «% mCnmen f iby ° ne ' m0dul0 ,024 ' foreach input pic*re. ft, the earlier pi«2^d3ay 
older) in each group of pictures, the temporal_reference is reset to zero. P Y 

SoS^SuiS 8 ^ (iD 10 ^ ^ m dkplay ordCT ' no -Po^ference shall 

picture_codin&_type - The picture_coding_type identifies whether a picture is an iiitra-coded pictured 
predicove^piaure<P), bidirectionally predictive-coded pictured)), oV intm-coded vSZydc ' 
coefficients picture(D) according to the foUowing table. D-pictures shaU never be included in the same 
video sequence as the other picture coding types. 



picture coding type II coding method 




forbidden 
intra-coded (I) 
predictive-coded (P) 
bidiimionally-predictive-coded (B) 
<fc intra-coded (D) 
reserved 

reserved 



vbvjelay ~ The ybv_delay is a 16-bit unsigned integer. For constant bitrate operation, the vbv delay 

of me decoder's buffer at the start of decoding thepicture so that the 
dealer's buffer does not overflow or underflow. The vbv_delay measures the time needed to fill the VBV 
buffer from an initially empty state at the target bit rate, R, to the correct level immediately before the 
current picture is removed from the buffer. 

^e value of vbv delay is the number of periods of the 90kHz system clock that the VBV should wait after 
receiving the final byte of the picture start code. It may be calculated from the state of the VBV as follows: 

vbv_delay = 90000 *B */R 
n n 

where: 

n>0 
* 

B n ° VBV ^"Pancy, measured in bits, immediately before removing picture n from the 
buffo- but after removing any group of picture layer data, sequence header data 
and the plcture_stari_code that immediately precedes the data elements of 
picture n. 

R = bitrate measured in bits/s. The full precision of the bitrate rather than the rounded 
value encoded by the bit_rate field in the sequence header shall be used by the 
encoder in the VBV model. 
For non-constant bitrate operation vbv_delay shall have the value FFFF in hexadecimal. 

full^Lforward.vector - If set to "F, then the motion vector values decoded represent integer pel 
offsets (rather than half-pel units) as reflected in the equations of 2.4.4.2. 

forward Lcode - An unsigned integer taking values 1 through 7. The value zero is forbidden. The 
variables forward_r_.size and forwards used in the process of decoding the forward motion vectors are derived 
from forward_f_code as described in 2.4.4.2 

full_peLbackward.vector - If set to "1", then the motion vector values decoded represent integer pel 
offsets (rather than half pel units) as reflected in the equations of 2.4.4.3. 
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ba ° k uT ar i-r: CO i e " uns ! gned inte 8 er ^8 v al"« 1 through 7. The value zero is forbidden The 
Zfef f^T^^f^ 3 ^-'- ^ ^P™^ °f ^coding the backward moZ SJi 
denved from backward_f_code as described in 2.4.4.3. 

'^™-^' t ^P i h ctui ? ~ A «>'} indicates the presence of the following extra information. If 
foSwtaS * extr *- infonna,ion -P i ^ will follow it lfitisse(to"0»,merearenodata 



following it 
extra_information_picture - Reserved 
extension_start_code - See 2.4.3.2. 
picture_extension_data ~ Reserved. 
nser_data_start_code - See 2.4.3.2. 
user.data -- See 2.43.2. 
2.4.3.5 Slice layer 



S;~!lf£- a> . de 1 sjice-s^-code is a string of 32-bits. Tbe first 24-bits have the value 000001 
££^c£v^ 

S!^ e - t - Cal ^ >OSiU ? n r ™ S fe « iven bv me ,as » eight bits of the slice start code. It is an unsigned 
mteger g,vmg the vertical position in macroblock units of the fim macroblock in me site Tne 

^ - P0S ^ 0n ° f "* ** row of ma eroblocks is one. Some sbces may have Selame 
co^nSat^' T^Z^ S,3rt anvwhere - Note that the slice_vertical_position is 

ZSal^noJ ^ 5 enon -° verl « slices with no gaps between mem. Tbe rnaximunTvalue of 

S^^cLff^^^f "^r 86 1 1031 u ^ to scale the reconstruction level of the 
retrieved 1X71 coefficient levels. The decoder shall use this value until another quantizer scale is 
encountered either at tbe slice layer or tbe macroblock layer. Tbe value z^TfoSE 

^l-'i^lntS,,^ 05 ^ th ^ nce «** followin 8 extra information. If extra bit_slice is 
set to 1 , extra_information_sbce will follow it If it is set to "0", there are no data following it 

extra_information_slice - Reserved. 
2.4.3.6 Macroblock layer 

n^SS C i-Kf i . ng " k™ 8 fe a fixed bit strin « "O 000 0001 1 11- which can be inserted by tbe encoder 
touicreasemeburatetothatrequiredofu^ ItfedSS by tte£££ 

tolS^^"* ' WOm f rob l ock - esca Pe codewords preceding the macroblock_addie«Jncrement 
then 66 is added to the value indicated by macroblock_address_inaement ^<"wress_increment 

ZS^^ drtS$ -^ Crenwnt - ™» 15 a variab,e ,en 8tb coded integer coded as per table B 1 which 
md^es the difference between macroblock_address and previcnis.niacroblock.address The maSmum 

nSlcS^^S: inCrementiS33 - V ^^^^cantencS„Im S g r mUm 
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The previous_macrob!ock_address is a variable defining the absolute position of the last non-skipped 
macroblock (see 2.4.4.4 for the definition of skipped macroblocks) except at the start of a slice. At the start 
of a slice, previous.macroblock^address is reset as follows: 

previousjnacroblock^ 

The spatial position in macroblock units of a macroblock in the picture (mb.row, mb.column) can be 
computed from the niaaoblock u _address as follows: 

mb_row = macroblock_address / mb.width 
mb.column = macroblock_address % mb_ width 

where mb_width is the number of macroblocks in one row of the picture. 
NOTE - The slicc_vertical_position differs from mb.row by one. 

macroblockjype - Variable length coded indicator which indicates the method of coding and content of 
the macroblock according to the tables B.2a through B.2d. 

macroblock_quant - Derived from macroblock_type. 

macroblock^motion^forward - Derived from macroblockjype. 

macroblock_motionJbadcward - Derived from macroblock_type. 

macroblock_pattern - Derived from macroblock_type. 

rnacroblockjntra - Derived from macroblock_type. 

quantizer.scale -- An unsigned integer in the range 1 to 31 used to scale the reconstruction level of the 
retrieved DCT coefficient levels. The value zero is forbidden. Hie decoder shall use this value until another 
quantizerjscale is encountered either at the slice layer or the macroblock layer. The presence of 
quantizer_scale is determined from macroblockjype. 

motion Jiorizontal_forward_code - motion.horizontaLforward.code is decoded according to table 
B.4. The decoded value is required (along with forward J" - see 2.4.4.2) to decide whether or not 
motion.horizontaLforward.r appears in the bitstream. 

motion.horizontaLforward.r - An unsigned integer (of forward_r_size bits - see 2.4.4.2) used in the 
process of decoding forward motion vectors as described in 2.4.4.2. 

motion_vertical_forward_code ~ motion_.vertical_forward.code is decoded according to table B 4 
The decoded value is required (along with forward J - see 2.4.42) to decide whether or not 
motion_vertical_forward_r appears in the bitstream. 

motion.verticaLforward.r - An unsigned integer (of forward_r_size bits - see 2.4.42) used in the 
process of decoding forward motion vectors as described in 2.4.4.2. 

motion.horizontaLbackward.code - motion.borizontal.backward code is decoded according to 
table B.4. The decoded value is required (along with backward.f - see 2.4.4.2) to decide whether or not 
motion_horizontal_backward_r appears in the bitstream. 

motion JiorizontaLbackward_r ~ An unsigned integer (of backward.r_size bits - see 2.4.4 2) used in 
the process of decoding backward motion vectors as described in 2.4.4.2. 

motion.verticaLbackward.code - motion.vertical.backward.code is decoded according to table B 4 
The decoded value is required (along with backward J) to decide whether or not motion.vertical backward r 
appears in the bitstream. 

motion.verticaLbackward^r - An unsigned integer (of backward.r.size bits) used in the process of 
decoding backward motion vectors as described in 2.4.4.3. 
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r^iSito^r^^^ 310 ?-? 31 ^ is a variabte >««th«o* that is used to derive the variable 

ssrcTpSS foiS: ,ock - totra fa zero - ^ ^ *• 

pattern_code[i] = 0, 

if ( cbp & (l«(5-i)) ) pattem_code[iJ = 1; 
if ( macroblockjntra ) pattem_code[i] = 1 ; 

pattera_codefO] - If 1, then the upper left luminance block is to be received in this macroblock. 

pattern_codefl] - If 1, then the upper right luminance block is to be received in this macroblock. 

pattem_code[2] - If 1, then the lower left luminance block is to be received in this macroblock. 

pattenj_.code[3] - If I, then the lower right luminance block is to be received in this macroblock.. 

pattero_code[4] - If 1, then the chrominance block Cb is to be received in this macroblock. 

pattem_code[5] - If 1, then the chrominance block Cr is to be received in this macroblock. 

end_of_macroblock - This is a bit which is set to "1" and exists only in D-Pictures. 

2.4.3.7 Block layer 

H^! _ i ize - ,Umln ? n f ' *"* number of bits in *** following dct_dc_differential code, 
codedbkS nanCe ' * aCC ° rdin8 10 ^ ^ B 5a - Note tois data element is used in intra 

t'.? '- s |f e -^ rominan «.- ™e number of bits in the following dcute.differential code, 
m^SdwSkT* 15 aCC ° rding 10 ^C^ 0 -^ Notematthisdauelemeniisusedin 

tht^S!!^ 1 ' ^ length unsigned integer. If dc.sizejuminance or dc.size chrominance 
l^SrS T*: ^.^-^erential is not present in the bitstream. dct_zz Q is the~array of 
quantized DCT coefficients m zig-zag scanning order. dct_zz[i] for i=0..63 shall be set to zero imtially If 

£tZi^Jc%^£r an( * ( " aPPr0Priate) h «~ "»» ~* toe, dct_zz [0 ] is com/uted 

For luminance blocks: 

else dcLzz[0] = ( (-1) « (dc^sizejuminance) ) I (dct_dcjifferential+l) ; 
For chrominance blocks: 

iSi^^nf & SI <K ^^c^nance^l)) ) dcL_zz[0] = da^c.diffeiential ; 
else dcuzzro] = ( (-1) « (dcjsize^chroininance) ) I (dc^dc_differential+l) ; 

Note that this data element is used in intra coded blocks. 



example for dc size luminance - 1 


dct_dc differential 


! dct_zzf0| 


000 


-7 


001 


-6 


010 


-5 


Oil 


-4 


100 


4 


101 


5 


no 


6 


111 


7 



Tt'S^^H^f 16 ^ 8tb J° d l!^ rding 10 ***** B 5c throu 8 h B 5f for tot first coefficient 
^^^^ 0 tZ^ mgtaM ^ ^^g-scannedquantizedDCT 
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i = run ; 

if(s = 0)dct_zz[i] = level; 
if(s=l)dcLzz[i] = - level; 



The tmns dcLcoeff first and dcLcoeffjnext are run-length encoded and dcuzzfi], i>=0 shall be set to zero 
"S, * y? n ^ len ^ c ? de *«*H»S B.5c through B.5f is used to represent the run-length 

and level of the DCT coefficients. Note that this data element is used in non-intra coded blocks. 

ttn^-^'t^t 1 ™** "* acCOlding 10 Ud),es B " 5c ^"g 0 B-5f for coefficients following 
the first renewed. The variables run and level ate derived according to these tables. The zigzag-scanned 
quantized DCT coefficient list is updated as follows. 



i = i + run +1 ; 

if(s = 0)dct_zz(ij = level; 

if ( s = 1 ) dci_zz[ij = - level ; 



tfmaCToblockJntra = 1 then the term i shall be set to zero before the first dct_coeff_next of the block 
The decoding of dct_coeff_next shaD not cause i to exceed 63. 

end_of_block - This symbol is always used to indicate that no additional non-zero coefficients are 
present It is used even if dct_zz[63] is non-zero. Its value is the bit-string "10" as defined in table B.5c. 
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2.4.4 The video decoding process 

Compliance requirements for decoders are contained in ISO/EC 1 1 1724. 
2.4.4.1 Intra-coded macroblocks 

In I-pictures all macroblocks are intra-coded and stored. In P-pictures and B-pictures, some macroblocks 
may be intra-coded as identified by macroblockjype. Thus, macroblockjntra identifies the intra-coded 
macroblocks. 

The variables mb_row and mb_column locate the macroblock in the picture. They are defined in 2.43.6. 
The definitions of dcute_di£feraitial, and dcLcoefLnext also have defined the zigzag-scanned quantized DCT 
coefficient list, dct^zzQ. Each dcLzzQ is located in the macroblock as defined by pattern_codeQ. 

Define dcLrecon[m][n] to be the matrix of reconstructed DCT coefficients of the block, where the first index 
identifies the row and the second the column of the matrix. Define dculc_y_past, dcLdc_cb_past and 
dtt_dc_crj>astto be the dcueccm[0][0] of the most recently decoded intra-coded Y, Cb and Cr blocks 
respectively. The predictors dcLdc_y_past, dct_dc_cb_past and dcUJc_OLpast shall all be reset at the start 
of a slice and at non-intra-coded macroblocks (including skipped macroblocks) to the value 1 024 (128*8). 

Define intra_quant[m][n] to be the intra quantizer matrix that is specified in the sequence header. 

Note that mtra_quant[0]rOJ is used in the dequantizer calculations for simplicity of description, but the result 
is overwritten by the subsequent calculation for the dc coefficient 

Define scan[m][n] to be the matrix defining the zigzag scanning sequence as follows: 



0 


1 


5 


6 


14 


15 


27 


28 


2 


4 


7 


13 


16 


26 


29 


42 


3 


8 


12 


17 


25 


30 


41 


43 


9 


11 


18 


24 


31 


40 


44 


53 


10 


19 


23 


32 


39 


45 


52 


54 


20 


22 


33 


38 


46 


51 


55 


60 


21 


34 


37 


47 


50 


56 


59 


61 


35 


36 


48 


49 


57 


58 


62 


63 



Where n is the horizontal index and m is the vertical index. 

Define pasUntra_address as the nwcroblock_address of the most recently retrieved intra-coded macroblock 
within the slice. It shall be reset to -2 at the beginning of each slice. 

Then dcLrecon[m][n] shall be computed by any means equivalent to the following procedure for the first 
luminance block: 

for (m=0; m<8; m++) { 

for (n=0; n<8; n++) { 

i = scan[m][n] ; 

dcUecon[m][n] = ( 2 * dct^zzfij * quantizer_scale * intraaiiant[m][nl ) /16 ; 
if((dcl_recon[m][n]&l) = 0) 

dcLrecon[m][n] = dcUecon[m][n] - Sign(dcuecon[m][n]) ; 
if (dcLrecon[m][nJ > 2 047) dcuecon[m][n] = 2 047 ; 
if (dcuecon[m][n] < -2 048) dcLrecon[m][n] = -2 048 ; 

} 

dcLrecon[0][0] = dct_zz[0] * 8 ; 

if ( ( macroblock^address - pasUntra__address > 1) ) 

da^recon[0][0] = (128 * 8) + dcuecon[0][0] ; 

else 

dcuecon[0][0] = dculc_y_past + dcuecon[0][0] ; 
dcLdc_y_past = dcuecon[0][0] ; 

Note that this process disallows even valued numbers. This has been found to prevent accumulation of 
mismatch errors. 
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For the subsequent luminance blocks in the macroblock, in the order of the list defined by the array 
pattenucodeQ: J 

for (m=0; m<8; m++) { 

for (n=0; n<8; n++) { 
i = scan[m][n]; 

dcLrecon[m][n] = ( 2 * dcLzzp] * quantizerjscale * intr^quantTmlfn] ) /16 : 
if((dcuecon[m][n]&l) = 0) 

dcUecon[m][n] = dcUtcon[m][n] - Sign(dcuecon[m][n]) ; 
if (dcLiecon[m][n] > 2 047) dcLrecon[m)[n] * 2 047 ; 
^ if (dcuecon[m][nJ < -2 048) dcuecon[m][n] = -2 048 ; 

} 

dcuecon[0][0] = dcufc^past + (dcLzz[0] * 8) ; 
dcLdc^y-past = dcuecon[0][0] ; 

For the chrominance Cb block,: 

for(m=0; m<8; m++) { 

for (n=0; ik8; n++) { 

i = scan[m][n]; 

dcuecon[m][n] = ( 2 * 6ajzz[\] * quantizerjscale * intr^quantTmirn] ) /16 ■ 
if ( ( dcuecon [m] [n] & 1 ) = 0 ) 

dcUecon[m][n] = dcUecon[m][n) - Sign(dcUecon[m][n]) ; 
if (dct_recon[m][n] > 2 047) dcUecon[m][n] = 2 047 ; 
^ if (dcuec(m[m][n] < -2 048) dcUseon[m][n] * -2 048 ; 

) . 
dcueconl0][0] = dcUn[0] * 8 ; 
if ( ( macrobloclcaddfess - pasUntRL_address ) > 1 ) 
da.recon[0][0] = (128 * 8) + dcUecon[0][0] ; 

else 

dcL.recon[0][0] = dcute_cb_past + dct_recon[0][0] ; 
dcLdcjcb_past = dcLrecon[0][0] ; 

For the chrominance Or block, : 

for (m=0; m<8; m++) { 

for (n=0; n<8; n++) { 

i = scanfm][n] ; 

dcuecon[m][n] = ( 2 * <k*_zz[i] * quantizer_$cale * intr^quantTmlfn] ) /16 : 
if((dcuecon[m][n]&l) = 0) 

dcuccon[m]fn] = dcUrecon[m][n] - Sign(dct.recon[m][n]) ; 
if (dcUecon[m][n] > 2 047) dcLrecon[m][n] = 2 047 ; 
^ if (dcLfecon[m][n] < -2 048) dct^recon[m][n] = -2 048 ; 

} 

dcLrecon[0]tO] = dctjeztO] * 8 ; 
if ( ( m acroblock_address - pasLintr^address ) > 1 ) 
dct_recon[0][0] = (128 * 8) + dcLrecon[0][0] ; 

else 

dcuecon[0][0] = dcLdc_cr_past + dcuecon[0][0] ; 
<kLdc_cr_past = dcUecon[0][0] ; 

After all the blocks in the macroblock are processed: 

pasLintixLaddress = macroblock_address ; 
Values in the coded data elements leading to dctjecon[0][0J < 0 or dctjwon[0][0J > 2 047 are not permitted. 

Once the DCT coefficients are reconstnicted, the inverse DCT tiansfonn defined in annex A shall be applied 
to obtain the inverse transformed pel values in the range [-256, 255]. These pel values shall be limited to 



33 



ISO/IEC 11172-2: 1993(E) 



© ISO/IEC 




matrices in the positions defined by 
axteQ. 



2.4.4.2 Predictive-coded macrobtocks in P-pictures 

Predictive-coded macroblocks in P-Pictures are decoded in two steps. 





forward Jisize and forward_f are derived from forward_Ccode as follows: 

forwanLrjsize = forward_f_code - 1 
forward_f = 1 « forward_r_size 

if ( (fonvanLf = 1) D (motion Jiorizontal^forward^code = 0) ) { 
compIeme«LhorizontaLforward_r = 0; 

}else{ 

j complemenchorizontal.forwardj- = forward_f - 1 - motion JiorizontaLforward_r, 

if ( (forward_f == 1) I) (motion_vertical_forward_code = 0) ) { 
complemait_vertical_forward_j-= 0; 

} else { 

complemenLvertical_fonvard_r = forward.f - 1 - motion_vertical_forwardj; 



righLlittle = motion Jiorizontal Jbrward_code * forward f: 
if(righUittIe=0)( 
righLbig a 0; 

}else{ 

if (righUittle > 0) { 

righUittle = righLlittle - compIemenLhorizontalJbrward r ; 
nghLbig b righUittie - (32 * forward J); 

} else { 

rightjittle = righLlittle + complemenLhorizontal.forward^r ; 
nghLbig = righLlittle + (32 * forward_0; 
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downjittle = motion_vertical_forwanj code * forward f- 
if(dowiUittIe==0){ 
downjrig = 0; 

}else{ 

if (downjittle >0){ 

downjittle = downjittle - complemenLverticaLforwarxJ r ; 
downjwg = dowrx.little- (32* forward 0; 

}else{ 

downjittle = downjittle + complemenLvertical Jorwardj- * 
^ downjng = downjittle + (32 * forward J); 

} 

max = ( 16 * forward J) - 1 ; 
nun = ( -16 * forward J) ; 

new_vector = recon JghtJor_prev + rightjittle ; 

if ( (new_vector <= max) && (new_vector>= min) ) 

else recon - ri S h ^ for = recon_righLfor_prev + rightjitde ; 

recon_right Jor = reconjrighLforjjrev + right J>ig; 
reconjighLfor^prev = recon .jightjor ; 

if ( full^Lfonvanl_vector ) recon^ghtjor = leconjghtjbr << 1 - 
new_vector = recon Jown_for_prev + downjittle • 
if ( (new_vector <= max) && (new_vector >= min) ) 

reconjo wn Jor = recor^downJor_prev + downjittle ; 

eise 

recon_downjor = recon JownJor_prev + down Jbig ; 
recon_down Jor_prev = recon^down Jor ; 

if ( fijllj)e]Jorward_vector ) recon_down Jor = recon_downJor « 1 ; 

S^ m ril?h^f * ^° 16 ? itS for ** "nacioWoclc righUor and down for, and che half pel unit 
flags, nghchalf Jor and dowOalf jfor, are computed as follows: " ^ 



for luminance 



rightjor = recon_rightJor » 1 ; 
down_for= recon Jo wn Jor » 1 ; 
right Jialfjor = recon^ghtjbr - (2*righLfor) ; 
down_half f or = recon down Jor - (2*dnwn fnrf • 



for chrominance 



rightjor a ( recon_rightJor / 2 ) » 1 ; 
down Jor = ( recon_down Jor / 2 ) » 1 ; 
rightJialiLfor = recon_righL.for/2 - (2*rightjbr) ; 
down_half fo r= recon down Jbr/2 - (2*down for) ; 



Motion vectors leading to references outside a reference picture's boundaries are not allowed. 

if ( (! righUialf_for )&& (! down_half for) ) 

pdfflDl ■ pelj^t[i+down_for][j+righLfor) ; 
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if((!righUialf_for)&& dowiUialfJbr ) 

pelfiJOJ = ( peLpast[i+down_for]0+righLfor] + 

peljast[i+down Jor+l][j+righcfor] ) // 2 ; 

if( righLhalf_for && (! down_half_for) ) 

pel[i][j] = ( pel_past[i+down_for][j+rigbLfor] + 

peljpastfi+down_for]0+righLfor+l] ) // 2 ; 

if( rightjialfjbr && down_half_for ) 

peirUOl = ( pelj»st(i+dowiufor][j+righL.for] + pel j)ast[i4dowfa^for+l][j+righCfor] + 
peIj>ast[i+down_for]|j+righL.for+l] +peLpast[i+down_for+l]Ij+right < lfor+l] ) 114 ; 

Define non Jntra^quantfmJ [n] to be the non-intra quantizer matrix that is specified in the sequence header. 

The DCT coefficients for each block present in the macroblock shall be reconstructed by any means 
equivalent to the following procedure: 

for ( m=0; m<8; m++ ) { 

for(n=0;n<8;n++) { 
i = scan[m][n] ; 

dcUt3Con[m][n] = ( ( (2 * dctjzz[i]) + Sign(dct_zz[i]) ) * 

quantizer jscale * non_intra_quant[m] [n] ) / 16 ; 
if ( ( dcuecon[m][n) & 1 ) = 0 ) 

dcLrecon[m]tn) = dct_recon[m][n] - Sign(dct_recon[m][n]) ; 
if (dcUecoh[m][n] > 2047) dcurecon[m][n] = 2047 ; 
if (dcuecon[m][n] < -2048) dctjrecon[m][n] = -2048 ; 
if(dct_zz[i] = 0)- 

dcLrecon[m][n] = 0; 

) 

) 

dct_recon[m][n] = 0 for all m, n in skipped macroblocks and when pattemp] = 0. 

Once the DCT coefficients are reconstructed, the inverse DCT transform defined in annex A shall be applied 
to obtain the inverse transformed pel values in the interval [-256, 255]. The inverse DCT pel values shall 
be added to the pel[i][j] which were computed above using the motion vectors. The result of the addition 
shall be limited to the interval [0,255]. The location of the pels is determined from mb_row, mb_column 
and the pattem_code list 

2.4.4.3 Predictive-coded macroblocks in B-pictures 

Predictive-coded macroblocks in B-Pictiires are decoded in four steps. 

First, the value of the forward motion vector for the macroblock is reconstructed from the retrieved forward 
motion vector information, and the forward motion vector reconstructed for the previous macroblock, using 
the same procedure as for calculating the forward modem vector in P-pictures. However, for B-pictures the 
previous reconstructed motion vectors shall be reset only for the first macroblock in a slice, or when the 
last macroblock that was decoded was an intra-coded macroblock. If no forward motion vector data exists for 
the current macroblock, the motion vectors shall be obtained by: 

recon_righLfor = recon_righLfor_prev, 
recoiutownjor = reconjown Jbr_prev. 

Second, the value of the backward motion vector for the macroblock shall be reconstructed from the 
retrieved backward motion vector information, and the backward motion vector reconstructed for the 
previous macroblock using the same procedure as for calculating the forward motion vector in B-pictures. 
In this procedure, the variables needed to find the backward motion vector are substituted for the variables 
needed to find the forward motion vector. The variables and coded data elements used to calculate the 
backward motion vector are: 

recon_righLback_prev, recon_downJxtck_prev, backward Jlcode, njll_pel_backward_vector 
motion JiorizontalJ>ackward_code, motion_horizon tal_back ward_r, 
motion^vertical^backward.code, motion_vertical_backward_r, 
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backwanLr_size and backward_f are derived from backwaidJLcode as follows: 

backward jcsize = teckward_f_code - 1 
backward./ = 1 « backwakLrjsize 

The following variables result from applying the algorithm in 2.4.4.2, modified as described in the 
previous paragraphs in this clause: 

righLfor righLhalf Jbr downjor downjialf for 
righcback rightJialLback dowr^back downJial£back 

They define the integral and half pel value of the rightward and downward components of the forward motion 
vector (which references the past picture in display order) and the backward motion vector (which references 
the future picture in display order). 

TOrd, the predictors of the pel values of the block being decoded, pel QQ, are calculated. If only forward 
motion vector mformation was retrieved for the macrobiock, then pelQQ of the decoded picture shall be 
calculated according to the formulas in 2.4.4.2. If only backward motion vector information was retrieved 
for the macrobiock, then pelQD of the decoded picture shall be calculated according to the formulas in the 
predictive<oded macrobiock clause, with "back 11 replacing "for", and peLfutureDD replacing peLpastQO. If 
both forward and backward motion vectors information are retrieved, then let pelJbrQO be the value 
calculated from the past picture by use of the reconstructed forward motion vector, and letpeLbackQD be 
the value calculated from the future picture by use of the reconstructed backward motion vector Then the 
value of pelQQ shall be calculated by: 

pelQD = ( peLforOD + peLbackOQ ) // 2 ; 

Define non_intra_quant[m][n] to be the non-intra quantizer matrix that is specified in the sequence header. 

Fourth, the OCT coefficients for each block present in the macrobiock shall be reconstructed by any means 
equivalent to the following procedure: 

for ( m=0; m<8; m++ ) { 

for ( n=0; n<8; n++ ) { 
i = scan[m][n] ; 

dcLrecon[m][n] = ( ( (2 * dctjzzfi]) + Sign(dct_zz[i]) ) * 

quanuzer_scale * nonJntra_quant[m][n] ) / 16 ; 
if ( ( da_recon[m][n] & 1 ) =: 6 ) 

dct.recon[m][n] = dct_recon[m][n] - Sign(dct_recon[m][n]) ; 
if (dct_recon[m][n] > 2 047) dcL.recon[m][n] = 2 047 ; 
if (dcLrecon[m][n] < -2 048) dcU«con[m][n] = -2 048; 
if(dct_zz[ij = 0) 

dcLrecon[m][nl = 0 ; 

} 

dct.recon[m][n] = 0 for all m, n in skipped macroblocks and when pattemfi] = 0. 

Once the DCT coefficients are reconstructed, the inverse DCT transform defined in annex A shall be applied 
to obtain the inverse transformed pel values in the range [-256, 255]. Hie inverse DCT pel values shall be 
added to pelQO, which were computed above from the motion vectors. The result of the addition shall be 
limited to the interval [0,255]. The location of the pels is determined from mb_row, mb_column and the 
pattern_code list 

2.4.4.4 Skipped macroblocks 

For some macroblocks there are no coded data, that is neither motion vector information nor DCT 
information is available to the decoder. These macroblocks are called skipped macroblocks and are indicated 
when the macroblock_address_increment is greater than 1 . 

In I-pictures, all macroblocks shall be coded and there shall be no skipped macroblocks. 
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In Pictures, the skipped macroblock is defined to be a macroblock with a reconstructed motion vector 
equal to zero and no DCT coefficients. 

In B-pictures, the skipped macroblock is defined to have the same macroblock_type (forward, backward, or 
both motion vectors) as the prior macroblock, differential motion vectors equal to zero, and no DCI 
coefficients. In a B-picture, a skipped macroblock shall not follow an intra-coded macroblock. 

2.4.4.5 Forced updating 

This function is achieved by forcing the use of an intra-coded macroblock. The update pattern is not 
defined. For control of accumulation of IDCT mismatch error, each macroblock shall be intra-coded at least 
once per eveiy 132 times it is coded in a P-picture without an intervening I-picture. 
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Annex A 

(normative) 

8 by 8 Inverse discrete cosine transform 

™ c * bv , 8 „?,^^ is i :rcle , cosine ^fonn for I-pictures and P-pictures shall conform to IEEE Draft 
Standard, P1180/D2, July 18, 1990. For B-pictures this specification may also be applied but maybe 
unnecessarily stringent Note that clause 2.3 of PI 1 80/D2 "Considerations of Specifying IDCT Mismatch 
Errors requires the specification of periodic intra-coding in order to control the accumulation of mismatch 
errors. rhe maximum refresh period requirement for this part of ISO/IEC 11172 shall be 132 intra-coded 
pictures or predictive-coded pictures as stated in 24.4 J, which is the same as indicated in PI 180/D2 for 
visual telephony according to CCITT Recommendation H.261 [5]. 
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Annex B 

(normative) 

Variable length code tables 



Introduction 

This annex contains the variable length code tables for macroblock addressing, macrobiotic type, 
macroblock pattern, motion vectors, and DCT coefficients. 

B.1 Macroblock addressing 

Table B.l. — Variable length codes for macroblock_address_increment. 



maaDblock_addre$s_ I 


I increment 


increment VLC code | 


[ value 


1 


1 


Oil 


2 


010 


3 


0011 


•4 


0010 


5 


00011 


6 


0001 0 


7 


0000111 


8 


0000110 


9 


00001011 


10 


00001010 


11 


00001001 


12 


00001000 


13 


00000111 


14 


00000110 


15 


00000101 11 


16 



macroblock _address_ 


increment 


increment VLC code 


value 


00000101 10 


17 


00000101 01 


18 


0000010100 


19 


0000 010011 


20 


0000 0100 10 


21 


0000 0100 011 


22 


0000 0100 010 


23 


0000 0100 001 


24 


0000 0100 000 


25 


0000 0011 111 


26 


00000011 110 


27 


0000 0011 101 


28 


0000 0011 100 


29 


0000 0011 011 


30 


0000 0011010 


31 


0000 0011001 


32 


0000 0011000 


33 


0000 0001 111 


macroblock_stuffing 


0000 0001 000 II macroblock_escape 
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B.2 Macroblock type 

. The properties of the macroblock are determined by the macroblock type VLC according to these tables. 



Table B.2a, - Variable length codes for macroblock.type In intra-coded 
pictures (I-pictures). 



macroblock. 
type VLC code 


macroblock. 
quant 


macroblock. 

motion. 

forward 


macroblock. 

motion. 

backward 


macroblock. 
pattern 


macroblock. 
intra 


1 

01 


0 
1 


0 
0 


0 
0 


0 
0 


1 
1 


Table B.2b. - Variable length codes for macroblock type in pre 
pictures (P-pictures). 


dictive-coded 


macroblock^ 
lypevjLA^ couc 


macroblock^ 
quant 


macroblock.. 

motion. 

forward 


macroblock. 

motion. 

backward 


macroblock. 
pattern 


macroblock. 
intra 


1 

01 
001 

AAA1 1 

UUUll 
00010 * 
00001 
000001 


0 
0 
0 
0 
1 
1 
1 


1 
0 
1 
0 
1 
0 
0 


0 
0 
0 
0 
0 
0 
0 


1 
1 
0 
0 
1 
1 
0 


0 
0 
0 
1 
0 
0 

1 


Table B.2c* . 
predictive-coi 


- Variable length codes for macroblock type in bid 
Jed pictures (B-pictures). 


irectionally 


macroblock. 
type VLC code 


macroblock. 
quant 


macroblock. 

motion.. 

forward 


macroblock. 

motion. 

backward 


macroblock. 
pattern 


macroblock. 
intra 


10 

11 

010 

011 

0010 

0011 

00011 

00010 

000011 

000010 

000001 


0 
0 
0 
0 
0 
0 
0 
1 
1 
1 
1 


1 
1 
0 
0 
1 
1 
0 

1 
1 

0 
0 


1 
1 
1 
1 
0 
0 
0 

1 

0 

1 

0 


0 
1 
0 
1 
0 
1 
0 

1 
1 
1 

0 


0 
0 
0 

o i 

0 ! 
0 

1 
0 
0 
0 

1 1 


Table B.2d. - 
pictures (D-j 


- Variable length codes for macroblock.type in dc intra-coded 
>ictures). 


macroblock. 
type VLC code 


macroblock. 
quant 


macroblock. 

motion. 

forward 


macroblock. 
motion, 
backward j 


macroblock. 
pattern 


macroblock. 
intra 


1 


0 


0 


0 


0 


1 
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B.3 Macroblock pattern 



Table B.3. - Variable length codes for coded_block_pattern. 



coded_blocJepatteni 




VLCcode 


cbp 


111 


60 


1101 


4 


1100 


8 


1011 


iu 


1010 


32 


1001 1 


12 


10010 


48 | 


10001 


20 


1000 0 


ATi 


0111 1 


28 


oino 


44 


01101 


52 


0110 0 


56 


0101 1 


1 


01010 


61 


01001 


2 


0100 0 


62 


0011 11 


24 


nnii in 


Jo 


001101 


3 


noil no 


63 


0010111 


5 


0010 110 


9 


0010101 


17 


0010 100 


33 


0010 011 


6 


0010010 


10 


0010 001 


18 S 


0010 000 


34 | 


0001 1111 


7 


0001 1110 


11 


0001 1101 


19 



mrifiri hlnrJr mttpm 

VLCcode 


IP : — 


0001 1100 


I 35 


0001 1011 


13 


0001 1010 


49 


0001 1001 


21 


0001 1000 


41 


0001 0111 


14 


0001 0110 


50 


0001 0101 


22 


0001 0100 


42 


0001 0011 


IS 

*•* 


0001 0010 

WW A WAV/ 




0001 0001 

WV «A WW A 




00010000 


43 


00001111 


25 


0000 1110 


^7 
j / 


oooonoi ! 




0000 1100 


JO 


00001011 [ 


29 


0000 1010 


45 


0000 1001 j 


53 


00001000 


57 


00000111 


30 


00000110 


46 


00000101 


54 


00000100 ] 


58 


0000 0011 1 


31 


0000 00110 I 


47 


0000 00101 


55 


0000 00100 


59 


00000001 1 


27 


0000 00010 


39 



42 



Fvhibit18. page 112 



©ISO/IEC 



ISO/IEC 11172-2: 1993(E) 



B.4 Motion vectors 



T « b „««M' 7, V . ar , lab,e !f Dgt !l C0des for "«»Mon_horizontaLforward code, 
motion_vertlcal_forward_code, motloo_horfcontal_backward_code "and 
motion_vertical_backward_code. 



motion 




VLCcode 


code 


00000011001 


-16 


00000011011 


-15 


0000 0011 101 


-14 


0000 0011 111 


-13 


0000 0100 001 


-12 


0000 0100011 


-11 


0000 010011 


-10 


0000 0101 01 


-9 


00000101 11 


-8 


00000111 


-7 I 


0000 1001 


-6 


0000 1011 


-5 


0000111 


-4 


0001 1 


-3 


0011 


-2 


011 


-1 


1 


0 


010 


1 


0010 


2 


00010 


3 


0000110 


4 


00001010 


5 


00001000 


6 


00000110 


7 


0000 0101 10 


8 


0000 0101 00 


9 


0000 0100 10 


10 


0000 0100010 


11 


0000 0100000 


12 


0000 0011 110 


13 


0000 0011 100 


14 


0000 0011010 


15 


0000 0011000 1 


16 ! 
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B.5 DCT coefficients 

Table B.5a -- Variable length codes for dct_dc_size_lumlnaDce. 



VLCcode I 


dct_dc_size_luminance 


100 1 


0 


00 


1 


01 


2 


101 


3 


110 


4 


1110 


5 


11110 


6 


111110 


I 


1111110 1 





Table B.5b. Variable length codes for dct_dc_s.ze_chrominance. 



VLCcode 


dct dc size chrominance 


00 


0 


01 . 


1 


10 


2 


110 


3 


1110 


4 


11110 


5 


111110 


6 


1111110 


7 


11111110 


8 
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Table B.5c. Variable length codes for dct_coeff_first and dct.coefT.next. 



dcccoefLfirst and dcLcoeffjjext 
variable length code (NQTE1 



(N0TC2) 
(N0TE3) 



10 
Is 
lis 
Oils 

0100 s 
0101s 

00101 s 
0011 1 s 
00110 s 
0001 10 s 
0001 lis 
0001 01s 
000100s 

0000110 s 
0000100 s 

0000111 s 
0000 101 s 
0000 01 
0010 0110 s 
0010 0001s 
0010 0101 s 
0010 0100 s 
0010 0111 s 
0010 0011s 
00100010 s 
0010 0000 s 
0000 0010 10 s 
0000 001100 s 
0000 001011s 
0000 0011 11 s 
0000 0010 01 s 
0000 0011 10 s 
0000 0011 01s 
0000 0010 00 s 



run 


level 


end__of block 




0 


1 


0 


1 


1 


1 


0 


2 


2 


1 


fx 

0 


3 


3 


1 


4 


1 


1 


2 


5 


1 


6 


1 


7 


1 


0 


4 


2 


2 


8 


1 


9 


1 


escape 
0 


5 


0 




1 


3 


3 


2 


10 


1 


11 


1 


12 


1 


13 


1 


a 


7 


1 


4 


2 ! 


3 


4 1 


2 


5 


2 


14 


1 


15 


1 


16 


1 



NOTES 

1 - The last bit V denotes the sign of the level, f 0' for positive 

T for negative. 
2- This code shall be used for dct^coefLfirsL 
3 This co de shall be used for del coeff next ; 
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Table BSd. - Variable length codes for dct_coeff_first and dct_coeff_next. 



dcLcoeff Gist and del coeff next 
variable length code (NOTE) 


nin 


level 


oooo oooi tini o 

UVAAJ UUUl iiUl S 


0 


8 


OOOO OAA1 1 AAA o 
UUUU Uwl iUUU S 


0 


9 


oooo oooi 001 i c 

UWU UUUl UU1 1 S • 


0 


10 


onon onoi noon c 

WW UUUl UUUU S 


A 

0 


11 


0000 0001 101 1 c 

\AJ\J\J UUUl 1U1 1. S 


1 

1 


5 


0000 0001 01 OO c 

UUUU VAAJl U1UU S 


2 


4 


0000 0001 1 lOO o 
WW WUI 1IUUS 


3 


3 


oooo 0001 oom c 

UUUU UUUl UU1U s 


4 


3 


oooo oooi nine 
UUUU UUU1 111US 


6 


2 


oooo oaai Aim „ 
UUUU UUUl U1U1 s 


7 


2 


AAAA AAA1 AAA1 « 

UUUU UUUl UuOl s . 


8 


2 


OOOO AAA1 1 1 1 1 « 

UUUU UUUl nils 


17 


1 


/Win AAA1 1A1A _ 

uuuu 0001 1010 s 


18 


1 


AAAft AAA1 1AA1 

0000 0001 1001 s 


19 


1 


AAAA AAA1 A 1 -| -a 

0000 0001 Oil l s 


20 


1 


AAAA AAAl t\1 1 A _ 

OOOO 0001 0110 S 


21 


1 


AAA/1 AAAA 1 1 At A - 

UUUU 0000 1101 0 S 


0 


12 


oooo oonn 1 1 aa t « 
UUUU UUUU 1 1UU 1 s 


0 


13 


AAAA AAAA 1 1 AA A _ 

uuuu OUOu 1100 0 s 


0 


14 


AAAA AAAA 1A11 i _ 
UUUUUUOO 1011 1 S 


0 


15 


AAAA AAAA 1A1 ■% r\ 

OUOU 0000 I0l l 0 S 


1 


6 


AAAA AAAA 1 A1 A i 

0000 0000 1010 1 S 


1 


7 


0000 0000 1010 0 s 


2 


5 


oooo OOOO lOOl 1 c 
UUUU UUUU 1UU1 1 s 


3 


4 


0000 0000 1001 0 s 


5 


3 


0000 0000 1000 1 s 


9 


2 


0000 0000 10000s 


10 


2 


0000 00001111 Is 


22 


1 


0000 000011110 s 


23 


1 


0000 000011101s 


24 


1 


0000 00001110 0 s 


25 


1 


000000001101 Is 


26 


1 


NO rE - The last bit V denotes the sign of flie level, *0 f for positive, 
T for negative. 
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Table B.5e. 



Variable length codes for dct_coeff_first and dct_coeff_next (concluded). 



variable length code 



0000 0000 0111 lis 
0000 00000111 10s 
0000 0000 011101s 
0000 0000 011100 s 
0000 0000 011011s 
0000 0000 0110 10 s 
0000 00000110 01s 
0000 0000 0110 00 s 
0000 0000 0101 lis 
0000 0000 0101 10 s 
0000 0000 0101 01s 
0000 0000 0101 00 s 
0000 0000 010011s 
0000 0000 0100 10s 
0000 0000 0100 01 s 
0000 0000 0100 00 s 
0000 0000 0011000 s 
0000 0000 0010111s 
0000 0000 0010110 s 
0000 0000 0010 101 s 
0000 0000 0010 100 s 
0000 0000 0010 011s 
0000 (WOO 0010 010 s 
0000 0000 0010 001 s 
0000 0000 0010 000 s 
0000 0000 0011 111 s 
0000 0000 0011 110 s 
0000 0000 0011 101 s 
0000 0000 0011 100 s 
0000 0000 0011 011 s 
0000 0000 0011010 s 
0000 0000 0011001s 
0000 0000 0001 0011s 
0000 0000 0001 0010s 
0000 0000 0001 0001s 
0000 0000 0001 0000 s 
0000 0000 0001 0100s 
0000 0000 0001 1010 s 
0000 0000 0001 1001 s 
0000 0000 0001 1000s 
0000 0000 0001 0111s 
0000 0000 0001 0110 s 

0000 0000 0001 0101s 

000000000001 mis 

0000 0000 0001 1110 s 
0000 0000 0001 1101s 
0000 0000 0001 1100s 
0000 0000 0001 1011s 



NOTE - The last 
T for negative. 



1 

■ 

run 


level 


0 


16 


0 


17 


0 


18 


0 


19 


0 


20 


0 


21 


0 


22 


0 


23 


0 


24 


0 


25 


n ' 


26 




27 


0 


28 


0 


29 


0 


30 


0 


31 


0 


32 


0 


33 


0 


34 


0 


35 


0 


36 


0 


37 


0 


38 


0 


39 


0 


40 


1 


8 


1 


9 


1 


10 


1 


11 


1 


12 


1 


13 


1 


14 


1 


15 


1 


16 


1 


17 


1 


18 




3 


11 


2 


12 


2 


13 


2 


14 


2 


15 


2 


16 


2 


27 




28 




29 




30 




31 
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Table B.5f. — Encoding of run and level following an escape code either as a 14-bit 
fixed length code (-127 <= level <= 127) or as a 22-bit fixed length code 
(-255 <= level <= -128, 128 <= level <= 255). 
(Note - This yields total escape code lengths of 20-blts and 28-blts respectively). 



fixed length code 


run 




fixed length code 


level 


0000 00 


0 




forbidden 


-256 


0000 01 


1 




1000 0000 0000 0001 


-255 


0000 10 


2 




1000 0000 0000 0010 




>•• 






iooo oooo oiii mi 


-129 


... 






1000 0000 1000 0000 


-128 








10000001 


-127 








1000 0010 


-126 


iin n 


63 




iin mo 


-2 






mi nn 


-1 






forbidden 


0 






0000 0001 


1 






bin nn 


127 






0000 00001000 0000 


128 






0000 00001000 0001 


129 






oooooooonn nn 


255 
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Annex C 

(normative) 
Video buffering verifier 



Constant rate coded video bitstreams shall meet constraints imposed through a Video Buffering Verifier 
(VBV) defined in clause CI. 

The VBV is a hypothetical decoder which is conceptually connected to the output of an encoder. Coded data 
are placed in the input buffer of the model decoder at the constant bilrate that is being used. Coded data is 
removed from the buffer as defined in C.1 .4, below. It is a requirement of the encoder (or editor) that the 
bitstream it produces will not cause the VBV input buffer to either overflow or underflow. 

C.1 Video buffering verifier 

Cl J The VBV and the video encoder have the same clock frequency as well as the same picture rate and 
are operated synchronously. y ' 

C1.2 The VBV has an input buffer of size B, where B is given in the vbvj>uffer_size field in the 
sequence header. 

Cl 3 The VBV input buffer is initially empty. After filling the input buffer with all the data that 
precedes the first picture start code and the picture start code itself, the input buffer is filled from the 
bitstream for the time specified by the vbv_delay field in the video bitstream. 

CM All of the picture data for the picture that has been in the buffer longest is instantaneously 
removed. Then after each subsequent picture interval all of the picture data for the picture which at that 
time has been in the buffer longest is instantaneously removed. 

For the purposes of this annex picture data includes any sequence header and group of picture layer 
data that immediately precede the picture start code as well as all the picture data elements and any 
trailing stuffing bits or bytes. For the first coded picture in the video sequence, any zero bit or 
byte stuffing immediately preceding the sequence header is also included in the picture data. 

The VBV buffer is examined immediately before removing any picture data and immediately after this 
picture data is removed. Each time the VBV is examined its occupancy shall lie between zero bits and B 
bits where, B is the size of the VBV buffer indicated by vbvj>uffer_size in the sequence header. 

This is a requirement for the entire video bitstream. 

To meet these requirements the number of bits for the (n+l)'th coded picture d n+] shall satisfy 
d n+1 >B n + (2IVP).B 

d n+ l <== B n + OWP) Real-valued arithmetic is used in these inequalities. 

where 

n >= 0 

B a VBV receiving buffer size given by vbvjbuffer_size * 16 384 bits. 
B n =s the buffer occupancy (measured in bits) just after time t 

R = bitrate measured in bits/s. The full precision of the bitrate rather than the rounded 
value encoded by the bit_rate field in the sequence header shall be used by the 
encoder in the VBV model. 

P = nominal number of pictures per second 

t n = the time when the n'th coded picture is removed from the VBV buffer 
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Figure C.l - VBV buffer occupancy 
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Annex D 

(informative) 

Guide to encoding video 



D.1 Introduction 

^7r^S b ^ 8roUDd , m ?! erial 10 ^P"**** «n>derstandand implemem this part of ISO/DEC 
11172. The normative clauses of this part of ISO/EC 1 1172 do not specify Oie desiW^^^l^n. 

e^es;^ 

ISnSi'^^Ff^ 0,6 bltstream m *«* a wa y * * fairly straightforward to desien a 
v^&SS *»* differ considerably in architecture and tapleie^r^ ^t have 

'Z^^Z^^*^*?*™ lhemeth «b and the results of P the decodingpSSsdSely 

SSilS^V SSSSi f,™ ■ a f" w medium whicn ^ deliver data at approximately 1,2 lSs 
ffiSSS^^ chisdalarate. The Strained U^n^ 
u! , toHKfi iSSSL? ^P^^b'tstreams that is expected to be widely used, is limited* daurates 

SSeCTCMScS SS ™ » tetecom / nU ^ Cati0,,S a PP licati0ns ^ BOfflEC 10918 by the 
ih^ t,»iib • } committee aimed at the coding of still pictures [61. Elements of both of 

mc T rated mt0 Part of ISO/IEC 1 1172, but subsequent development work bv the 
comnuuee resulted m coding elements that are new to this part of ISO/IEC11172 U Sf21 riv« Z 
account oi the method by which ISO/IEC nCl/SC29/WGlUMPEG) d^vdop^m^ m<* TsSSc 
1 1 172, and a summary of this part of ISO/IEC 1 1 172 itself. pan oi wu/usu 

D.2 Overview 

D.2.1 Video concepts 

5£SK?£? P,C ^ rateS 0f 81)6111 24 10 30 I**"* ^euseofmrword> ? aure- i owS 
fiame _ js debberate. This part of ISO/IEC 11 172 codes progressively-scanned images and dSSJ 

21^^ placed source video converted to ^SSSSSm 
before codmg. After decodmg, the decoder may optionally produce an interfaced format for display 

^S 0 ? is designed to permit several methods of viewing coded video which are 
normally assocated wrth VCRs such as forward playback, freeze picture, fasf forward, S revSe^u. slow 
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forward. In addition, random access may be possible. The ability of the decoder to imnlement these mnA»c 
depends to some extent on the nature of the digital storage medium on vSZ.^S^^!^ 0 ^ 



! digital storage medium on which the coded video is stored. 
The overall process of encoding and decoding is illustrated below: 



Source 




Preprocessing 




Encoding 







Storage 
andfor 
Transmission 



□ 



Decoding 



Postprocessing 



Oisplay 



Figure D.l - Coding and decoding process 

Figure D 1 shows a typical sequence of operations that must be performed before moving pictures can be 
M££ W £ ™ e h unencoded ™y exist in many forms, such as the CORGM S CtaL 
D J describes how such a source may be converted into the appropriate resolution for sEuenTenSf 
In the encoding step, the encoder must be aware of the decoda buffeM^ 

S^SS£.^°rff ?~ EraDd itS ° vetf,ow underflow problem is introduced fa D4 and I ra£ 
S^, 111 D61 ™ C Structure ofan KG/EC 11172-2 bitstream is covered^ D S^tuftbe 
SSSSZ To 016 Vide °- F ° U °r « 016 «»*■ P^, me bitSmay £ »p£S to 

D.2.2 MPEG video compression techniques 

SS°of Sn^S^,' HfS ° f T M piCtoeS> each Pfctare « treated as a twcMlimensional 
rhminS^J h ^ (P ^ b) - 00,0,11 ^Presentation for each pel consists of three components- Y 
(luminance), and two chrominance components, Cb arid Cr. components, i 

SZSon 11 2 *l££?J? ae0 ^ fTOm use of ^ ««*iHq«s: subsampling of the chrominance 
S^^JSSSnJ 'J' hUI ? an ViSUal SyStem ("VS) quLtizaUoT~ 

D.2.2.1 Subsampling of chrominance information 

2c*S S *fS^£?£^ l ?° n ° f ™ ,uminance com P° nen «. so the Y pel values are 

D.2.2.2 Quantization 

2HSJ°k rcpreSen . tS 3 **** of values by a value m range. For example, convertintt a real 

Sv^feTS v d ^ qUaD, f dvaIue * «"* ^ 0U ^» nSnriI^ ere ° C8 ^ 
effidJncf SenSmVe 10 qUantiZad0n n ° iSC 80 SUCb noise « *» ^wed to be large,Snaeasing coding 



lion noise. Under some circumstances, the 
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D.2.2.3 Predictive coding 

Predictive coding is a technique to improve the compression through statistical redundancy. Based on values 
JK re T y deCOded * both ^ encoder 311(1 decker can estimate or predict the value of a pel yet to be 
eucodM or decoded The difference between the predicted and actual values is encoded. This difference value is 
the prediction i error which the decoder can use to correct the predictioa Most error values will be small and 
cluster arorad the value 0 since pel values typically do not have large changes within a small spatial 
neighlwurhood. The probability distribution of the prediction error is skewed and compresses better than the 
disttibutton of the pel values themselves. Additional information can be discarded by quantizing the 
prediction error. In this International Standard predictive coding is also used for the dc-values of successive 
luminance or chrominance blocks and in the encoding of motion vectors. 

D.2.2.4 Motion compensation and interframe coding 

Motion compensation (MQ predicts the values of a block pels in a picture by relocating a block of 
neighbouring pel values from a known picture. The motion is described as a two-dimensional motion 
vector that specifies where to retrieve a block of pel values from a previously decoded picture that is used to 
predict pel values of the current block. The simplest example is a scene where the camera is not moving 
and no objects in the scene are moving. The pel values at each image location remain the same, and the 
motion vector for each block is 0. In general however, the encoder may transmit a motion vector for each 
maooblodk. The translated block from the known picture becomes a prediction for the block in the picture 
to be encoded. The technique relies on the fact that within a short sequence of pictures of the same general 
scene, many objects remain in the same location while others move only a short distance. 

D.2.2.5 Frequency transformation , 

The discrete cosine transform (DCT) converts an 8 by 8 block of pel values to an 8 by 8 matrix of 
horizontal and vertical spatial frequency coefficients. An 8 by 8 block of pel values can be reconstructed by 
perforating the inverse discrete cosine transform (IDCT) on the spatial frequency coefficients. In general 
most of the energy is concentrated in the low frequency coefficients, which are conventionally written in the 
upper left comer of the transformed matrix. Compression is achieved by a quantization step, where the 
quantization intervals are identified by an index. Since the encoder identifies the interval and not the exact 
value withm the interval, the pel values of the block reconstructed by the IDCT have reduced accuracy. 

The DCT coefficient in location (0,0) (upper left) of the block represents the zero horizontal and zero 
vertical frequency and is railed the dc coefficient The dc coefficient is proportional to the average pel value 
ot toe 8 by 8 block, and additional compression is provided through predictive coding since the difference in 
the average value of neighbouring 8 by 8 blocks tends to be relatively small. The other coefficients 
represent one or more nonzero horizontal or nonzero vertical spatial frequencies, and are called ac 
coefficients. The quantization level of the coefficients corresponding to the higher spatial frequencies favors 
tne creation of an ac coefficient of 0 by choosing a quantization step size such that the HVS is unlikely to 
perceive the loss of (he particular spatial frequency unless the coefficient value lies above the particular 
quantization level The statistical encoding of the expected runs of consecutive zero-valued coefficients of 
mgberorder coefficients accounts for considerable compression gain. To cluster nonzero coefficients early in 
tne series and encode as many zero coefficients as possible following the last nonzero coefficient in the 
ordering, the coefficient sequence is specified to be a zig-zag ordering; see figure D.30. The ordering 
concentrates the highest spatial frequencies at the end of the series. 

D.2.2.6 Variable-length coding 

Variable-length coding (VLQ is a statistical coding technique that assigns codewords to values to be 
encoded Values of high frequency of occurrence are assigned snort codewords, and those of infrequent 
oomrrence are assigned long codewords. On average, the more frequent shorter codewords dominate, such 
that the code string is shorter than the original data 

D.2.2.7 Picture interpolation 

If the decoder reconstructs a picture from the past and a picture from the future, then the intermediate 
pictures can be reconstructed by the technique of interpolation, or bidirectional prediction. Blocks in the 
mtermediate pictures can be forward and backward predicted and translated by means of motion vectors The 
decoder may reconstruct pel values belonging to a given block as an average of values from the past and 
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D.2.3 Bitstream hierarchy 

335Si?fS)Prt^E S^LT^^l 3nd of a head " ^ some number ofgroups- 




orP- 



<»£dptawe&, which are coded using motion compensation torn a'^ 
picture; B-pictures, or bidirectional predictive coded pictures, wMcb^D^™£^v» 
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Figure D.2 .. Dependency relationship between I, B, and P-pictures 

ISJ??^ 01 1,16 P? 0 *"" 5 ^dencies, the bitstream order, i.e. the order in which pictures are 
K^S^aSS 7 f ^ but ^er'the order which Se ScS^Sres mem 
060006 bltstream - ^ example of a sequence of pictures, to display order, might be: 

J? ? ??f PB BP BBIB BPBBP 
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 

Figure D.3 - Typical sequence of pictures in display order 

whereas the bitstream order would be as shown below: 

OM^IJJo??' BBPBBPBB 
0 3 1 2 6 4 5 9 7 8 12 10 11 15 13 14 18 16 17 

Figure D.4 « Typical sequence of pictures in bitstream order 

S^nSSon 3 hea ° er "* " m ° re SUCCS - 1116 Dicturc header «*■ time, picture type, and 

SSJiK ^i^,* 03,3 """P** Should the bitstream become unreadable within a 
pjrture, the decoder should be able to recover by waiting for the next slice, without having to dropan entire 
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A macroblock is the basic unit for moUon compensation and quantizer scale changes. 

contains quantizer scale and motion compensation information macroblock header 



0 


1 


2 


3 



ED CD 



Y Cb Cr 

Figure D.S ~ Macroblock structure 

13?* 'Jns a 16-pel by 16-line section of luminance component and the spatially 
corresponding 8-pel by 8-line section of each chrominance comoonent. A xkimwi mTrSEJ * 
which no information is stored (see 2.4.4.4). component a skipped macroblock is one for 



Each block contains 64 



Figure D.6 - Block structure 
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D.2.4 Decoder overview 

A simplified block diagram of a possible decoder implementation is shown below: 



Input 
Buffer 



VLC 
Decoder 



Inverse 
zig-zag & 
Quantizer 



Inverse 
DCT 




Forward 
MC 



Future 
Picture 
Store 



Interpolated 
MC 



Backward 
MC 



Artfcr 



Decoded Video 



Display 
Buffer 



Figure D.7 - Simplified decoder block diagram 

It is instructive to follow the method which the decoder uses to decode a bitstream containing the sequence 
of pictures given in Fig D.4, and display them in the order given in Fig D.3. The following description is 
simplified for clarity. . 

The input bitstream is accumulated in the Input Buffer until needed. The Variable Length Code (VLC) 
Decoder decodes the header of the first picture, picture 0, and determines that it is an I-picture. The VLC 
Decoder produces quantized coefficients corresponding to the quantized DCT coefficients. Tbese are 
assembled for each 8 by 8 block of pels in the image by inverse zig-zag scanning. The Inverse Quantizer 
produces the actual DCT coefficients using the quantization step size. Hie coefficients are then transformed 
into pel values by the Inverse DCT transformer and stored in the Previous Picture Store and the Display 
Buffer. Hie picture may be displayed at the appropriate time. 

The VLC Decoder decodes the header of the next picture, picture 3, and determines that it is a P-picture. 
For each block, the VLC Decoder decodes motion vectors giving the displacement from the stored previous 
picture, and quantized coefficients corresponding to the quantized DCT coefficients of the difference block. 
Hiese quantized coefficients are inverse quantized to produce the actual DCT coefficients. The coefficients 
are then transformed into pel difference values and added to the predicted block produced by applying the 
motion vectors to blocks in the stored previous picture. Hie resultant block is stored in the Future Picture 
Store and the Display Buffer. This picture cannot be displayed until B-pictures 1 and 2 have been received, 
decoded, and displayed. 

The VLC Decoder decodes the header of the next picture, picture 1, and determines that it is a B-picture. 
For each block, the VLC decoder decodes motion vectors giving the displacement from the stored previous 
or ftjture pictures or both, and quantized coefficients corresponding to the quantized DCT coefficients of the 
duTerence block. These quantized coefficients are inverse quantized to produce the actual DCT coefficients 
The coefficients are then inverse transformed into difference pel values and added to the predicted block 
produced by applying the motion vectors to the stored pictures. The resultant block is then stored in the 
Display Buffer. It may be displayed at the appropriate time. 

Tlie VLC Decoder decodes the header of the next picture, picture % and determines that it is a B-picture It 
jsdecoded using the same method as for picture 1. After decoding picture 2, picture 0, which is in the 
Previous Picture Store, is no longer needed and may be discarded. 

The VLC Decoder decodes the header of the next picture, picture 6, and determines that it is a P-picture. 
The picture m the Future Picture Store is copied into the Previous Picture Store, then decoding proceeds as 
for picture 3. Picture 6 should not be displayed until pictures 4 and 5 have been received and displayed 
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D.3 Preprocessing 

The source material may exist in many forms, e.g. computer files or rem *ni f „ m . u 
n.ustb.pxocessedbefote being encoded. SdiSSS^^^iS^^ 11 

^^cS^Tc^^^lT^ • PiC ^- rate ^ «»«« ^ which to 
expended on uJS^2£&l3^£^.^ 8 ^ ^ too many bits will be 
accurately. If u*«esoS^^ 

will be lost The optimum reWdo^Sjft^^^^ b . ut u * ^ency detail 

and blockiness) and the peiSd mriS^ a fradeoff between the various coding artifacts (e.g. noise 

by the uttoowWf ESS^SSS!SS^7^ ^r* 6 - ™ S ,nukaft * fi«her complicated 
the screen. viewing conditions, e.g. screen brightness and the distance of the viewer from 

tiSSS^iS^ SfL^ T "**f w ^ of 24. 25 and 30 pictures/s, a 

No^thatt^vres^^ 

D.3.1 Conversion from CCIR 601 video to MPEG SIF 

SSSr'StaS^ Ifuaeotherfieldis 
artifacts. MoreloDtedca^m«h^c ^ * may produce visibIe ^ objectionable 

perceptibtyS^^ 

«?4 h S^s^cS U r^^ be Mved by , P"** «» subsam P'N,. Consider a picture in 
c.nve«^T^ pat^ may be 

^"^num^ofnUr^^^ 
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JS ^°r^ T I" 10 " by 3 6ctor 0f ^ 10 addition chrominance values may be vertically 
E^Si^SS """Tf and chrominance nave to be chosen carefully since paLularSo 
St^fhf U °" ° f ^P 168 in ^Pective International Standards. ^eTmporal 
relauonship between luminance and chrominance must also be correct 
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(a) Sampling pattern for 4:2:2 (CCIR 601) (b) Sampling pattern for MPEG (SIF) 

Circles represent luminance; Boxes represent Chrominance 

Figure D.8 Conversion of CCIR 601 to SIF 

The following 7-tap FIR filter has been found to give good results in decimating the luminance: 

I -29 I 0 I 88 j 138 1 88 I oTISI //256 

Figure D.9 - Luminance subsampling filter tap weights 

Use of a power of two for the divisor allows a simple hardware implementation. 

JSi^nT^f" 15168 ,^ ve >, appear m between me ,uminan « samples both horizontally and 
vertically. Hie following linear filter with a phase shift of half a pel may be found useful. 

I 1 I 3 I 3 IT) 11% 
Figure D.10 - Chrominance subsampling filter tap weights 

l^t^S^Sl SS 8te, ,• , ?? ^ CCIR 601 * rid of fig«reD-8(a). the process of interpolation is 
fuS J^„tStt°. , ^ Whed to a zero-padded signal can be chosen to be equal to the decimation 
filter employed for the luminance and the two chrominance values in the encoder 

Note that these filters are not part of the International Standard, and other filters may be used. 

mus^bTadomS 5 SUCn 85 "^normalizing the filter or replicating the last pel, 

Ef^nfS JJ 6 °^ W,n ? ^P 16 Sbows a DOrizontal line of 16 inminance pels and me same Une 
after filtering and subsampling. In this example the data in the line is reflected at eachend. 

10 12 20 30 35 15 19 11 11 l9 26 45 80 90 92 90 
12 32 23 9 12 49 95 92 

Figure D.ll - Example of filtering and subsampling of a line of pels 
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The result of this filtering and subsampling is a source input format (SIF) which has a luminance 
resolution of 360 x 240 or 360 x 288, and a chrominance resolution which is half that of the luminance in 
each dimension. 
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(a) Luminance 
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(b) Chrominance 

Figure D.12 - Conversion from CCIR 601 into SIF 

The SIF is not quite optimum for processing by MPEG video coders. MPEG video divides the luminance 
component into macroblocks of 16x16 pels. The horizontal resolution, 360, is not divisible by 16. The 
same is tme of the vertical resolution, 242, in the case of 525-line systems. A better match is obtained in 
the horizontal direction by discarding the 4 pels at the end of every line of the subsampled picture. Care 
must be taken that this results in the correct configuration of luminance and chrominance samples in the 
macroblock. The remaining picture is called the significant pel area, and corresponds to the dark area in 
figure D.13: 
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Luminance 
176 




C r or Cb 

Figure D.13 - Source input with significant pel area shaded dark 
The conversion process is summarized in table D.l. 

Table D.l ~ Conversion of source formats 



Picture Rate (Hz) 

Picture Aspect Ratio (width:he.pht) 
Luminance (Y) 

CCIR Sample Resolution 

SIF 

Significant Pel Area 


1 29,97 
4:3 

720 x 484 
360 x 242 
352x240 


|25 
4:3 

720 x 576 
360x288 
352 x 288 


Chrominance (Cb Cr) 
CCIR Sample Resolution 
SIF 

I Significant Pel Area 


360x484 
180 x 121 
176 x 120 


360 x 576 
180 x 144 
176 x 144 



*!E ^JSstent^T "°r ^ °"* proCCSSin « Ste P s - °«*t resolutions may 

fceeSSng SbaSdSdS^* 10 ^ 1x5 in row - ™e decoder would discard 
^ 6 1,6,3 ^^g. g lV uig a final decoded horizontal resolution of 360 pels. 

D.3.2 Conversion from film 

iSSlS^^S^^J^.^ U ^ m eXCe,,eM SOUrce for » IS0 ^ 1 1172-2 
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Sometimes the source material available for compression consists of film material which has been 
converted to vi^ at some other rate. The encoder may detect this and recede at the original film rale. For 
3„ * , ? at f nal ^ tovc »*» di 8^ and converted to a 30 frame/s system by the 

£Sf T? "I? d0W ?- *" ^ mode digi,i2ed Dictures « ^ aJtemately for 3 and for 2 television 
rSS^T "55 DOt te exact since ^ rate might be 29,97 frames/s and not 

SfJiST^^ *f 3 ? P""^ technique glves ' lD addition ^ P""*™ ^8 alight have been 
changed by editing and splicing after the conversion. A sophisticated encoder might detect the explicated 
Gelds -average them to reduce digitization noise, arel code me resuk at the original 24 pictureys rate This 
should give a significant improvement in quality over coding at 30 pictures per second? since direct codine 
at 30 pictures/s destroys the 3:2 pulldown timing and gives a jerky appearance to the final decode? vide!? 8 

D.4 Model decoder 

D.4.1 Need for a decoder model 

A coded bitstream contains different types of pictures, and each type ideally requires a different number of 
bits to encode. In addition, the video may vary in complexity with time, and an encoder may wish to 
devotemore coding bits to one part of a sequence than to another. For constant bitrate coding varvintr the 

« th ! 1 mm ^ iate P ,cture - ™ e extent to which an encoder can vary the number of bits allocated to 
each picture depends on the amount of this buffering. If the amount of the buffering is large an encoder can 
™™ a 9 a *> ^creasing the picture quality, but at the cost of increasing the decoding delay 

^^^ t t^^f $he f ° f ^ a ? ountofthedecoder ' s Bering » order to determine to what extent 
they can vary the distribution of coding bits among the pictures in the sequence. 

^trfftl^f !?. defined !° 801,6 two WMems. It constrains the variability in the number of bits that 

k d,ffer f nt p,cturcs 31,(1 U ^ows a decoder to initialize its buffering when the system is 
?Ktl be noted that Parti of this International Standard addresses the initialisation of buffers and 
the maintenance of synchronisation during playback in the case when two or more elementary streams (for 
example one audio and one video stream) are multiplexed together. The tools defined in ISO/IEC 11172-1 
for Jhe maintenance of synchronisation should be used by decoders when multiplexed streams are being 

D.4. 2 Decoder model 

SS i ^n d ^ Did0n ° f a parameterized model decoder ^ this purpose. It is known as a Video 
Buffer Verifier (VBV). The parameters used by a particular encoder are defined in the bitstream. This really 
d ^r^^^' dec ° dcr * feded if encoders are to be assured that the coded bitstreams they produce 
wdlbedecodable. The model decoder looks like this: i»uuuw= 




Figure D.14 « Model decoder 

A f«ed-rate channel lis assumed to put bits at a constant rate into the Input Buffer. At regular intervals set 
by tiie Picture rate, the Picture Decoder instantaneously removes all the bits for the next picture from the 

£ r 6 * ^ ^ fCW ^ iD faput Buffcr ' i e - me bits for ^ ne « P**« ««ve not been 
received, then the Input Buffer underflows and there is an underflow error. If, during the time between 
picture starts, the capacity of the Input Buffer is exceeded, then there is an overflow error. 

Practical [decoders differ torn this model in several important ways. They may implement their buffering at 
a deferent point in the decoder, or distribute it throughout the decoder. They may not remove all the bits 
required to decode a picture from the Input Buffer instantaneously, they may not be able to control the start 
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of decoding very precisely as required by the buffer fullness parameter in the picture header, and they take a 
fimteUme to decode. They may also be able to delay decoding for a short time to reduce the chances of 
underflow occunng. But these differences depend in degree and kind on the exact method of implementation. 
cS^T nentS of different implementations, the MPEG video committee (ISO/IEC JTC1 
SC29/WG11) chose a very simple model for the decoder. Practical implementations of decoders must 
I^TJT ^u*? C ^° de ? e Wtstream coined *y Ms model. In many cases this will be achieved by 
*X Input Buffer ^ns brger than the minimum required, and by using a decoding delay that is larger 
than the value derived from the vbv.delay parameter. The designer must compensate for differences between 
tl Si 10 gUanmtee ^ <^rcan handTe any Wtstream^sST 

Bnco&rsmonltor the status of the model to control the encoder so that overflow problems do not 
occur. The calculated buffer fullness is transmitted at the start of each picture so that the decoder can 
maintain synchronization. -vw^v. v<u. 

D.4.3 Buffer size and delay 

For constant bit rate operation each picture header contains a vbv.delay parameter to enable decoders to start 
tneir decoding correctly. This parameter defines the time needed to fill the Input Buffer of figure D 14 from 
an empty state to the correct level immediately before me Picture Decoder removes all the bits for the 
picture. This tune is thus a delay and is measured in units of 1/90 000 s. This number was chosen because 
it is almost an exact multiple of the picture durations: 1/24, 1/25, 1/29.97 and 1/30, and because it is 
comparable m duration to an audio sample. 

The delay is given by 

D = vbv.delay / 90 000 S 
For example, if vbv delay were 9 000, then the delay would be 0,1 sec This means that at the start of a 
biSSf m0del deCOdCT Sh ° Uld °° ntain CXacUy 0,1 s wonh ofdata from the input 

oftofe**^ 

B = D * R = vbv.delay * R / 90 000 bits 
wuldbe 1 120 000 V ~ <telay Were9000andRwere U Mbits/s ' *« me number of bi * "» «"* Input Buffer 

sh?uW nt^?cSh^ue tetrcam fopUt BuffCr tave 3 of 327 680 Wts, and B 

D.5 MPEG video bitstream syntax 

This clause describes the video bitstream in a top-down fashion. A sequence is the top level of video coding 
It begins with a sequence header which defines important parameters needed by the decoder The sequence 
header is followed by one or more groups of pictures. Groups of pictures, as the name suggests, insist of 
one or more individual pictures. The sequence may contain additional sequence headers. A s%uence fc 
tenninated by a sequence.end.code. ISO/EC 11 172-1 allows considerable flexibility in specifying 
application parameters such as bit rate, picture rate, picture resolution, and picture aspect ratio These 
parameters are specified in the sequence header. 

If mese parameters, and some others, fall within certain limits, then the bitstream is called a constrained 
parameter Ditstream. 

D.5.1 Sequence 

A video sequence commences with a sequence header and is followed by one or more groups of pictures and 
is ended by a sequence.end.code. Additional sequence headers may appear within the sequence In each 
such repeated sequence header, all of the data elements with the permitted exception of those defining 
quantization matrices Ooadjntra_quantizer_matrix, load_non_mtra_quantizer_matrix and optionally 
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D.5.1.1 Sequence header code 

A coded sequence begins with a sequence header and the header storts with the sequence start code. Its value 

hex: 000001 B3 

binary: 000000000000 000000000001 10110011 
This it ia unique string of 32 bits that cannot be emulated anywhere else in thehiKt™™ k . 

underflow. This procSnreTs SiSl'Sr^ decoder buffer 
bitsmustallbezem ITrcdecoder^scffS Tto «» flfc « 

D.5.1.2 Horizontal size 
D.5.1.3 Vertical size 

the maximum^ Se oTsS veSJoStJ ^5ffi5" USU3llV 3 muhipte ° f 16 ' Note *« 

2 800 lines At 1 s S"I ^V 08 *?■ 175 ( decun al). which corresponds to a picture heieht of 

SCtoiS 5^i5cS2?!3 is 240 to f 5S pe ^ a,ues of 240 ^ 

line PAL and SECAM systems * * Va,UeS ° f 288 1X5,5 316 more apwriate for 625- 

LtSgtSuK^ 

discard these extra Jes beforertsptay can be coded m a macroblock. The decoder should 

SS** "** repHCating * ** ,ine 0f f* i$ ^ better than filling in the renuuning pels with 
D.5.1.4 Pel aspect ratio 

nSnL^^ ? - * - •» v-ing screen. This is needed since the 

OT ^ p, ^ s ^ bv ^^vesdonots^ 

The pel aspect ratio does not give the shape direcdy, but is an index to the following look up table: 
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Table DJ - p«l aspect raUo 



CODE 

0000 

0001 

0010 

0011 

0100 

0101 

0110 

0111 

1000 

1001 

1010 

1011 

1100 

1101 

1110 

1111 



HEIGHT/WIDTH 



undefined 
1,0 

0,6735 
0,7031 
0,7615 
0,8055 
0,8437 
0,8935 
0,9157 
0,9815 
1,0255 
1,0695 
1,0950 
1,1575 
1,2015 
undefined 



COMMENT 



Forbidden 
square pels 

16:9 625-line 



16:9 525-line 
702x575 at 4:3 = 0,9157 

711x487 at 4:3 = 1,0950 
reserved 



height / width = 0,75 * 702 / 575 = 0.9157 

^enSn^lf^ ** diSP ' aying piMures 0n «* TV system (see CCIR 

height / width = 0,75 * 71 1 / 487 = 1,0950 
The code 1111 is reserved for possible future extensions to this part of ISO/IEC 11172. 

-SngTfSuS n,S * ** ^ m by ******* these two points 1000 and 1 100 

aspect ratio = 0,5855 + 0,044N 

^^^1%^::^^^ ** — »*» ^ -ful for HDTV 

^^T^^r^^^T aspect nuios to be specified. Wherefore 

the nearest value in the taSe?S^ wSSav ^^^^^ ** ,al *** ■*> * 
which they are capable. WU1 dBpIay ^ decoded to the nearest pel aspect ratio of 

D.5.1.5 Picture rate 

This is a four-bit integer which is an index to the following table: 
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Table D.3 - Picture rate 

CODE II PICTURES PER SECOND 



0000 


Forbidden 


0001 


23,976 


0010 


24 


0011 


25 


0100 


29,97 


0101 


30 


0110 


50 


0111 


59,94 


1000 


60 


1001 


Reserved 


1111 


Reserved 



m^Z^Z^ZT*-^ COmmon,y avaaab,e ^ ^ analog or digital sequences. One advantage 

K£2?J£"? B 5 a ^ f ? piCtUre Iatesisthat stand *« techniques may be used to convert to the 
display rate of the decoder if it does not match the coded rate. 



D.5,1.6 Bit rate 



™!if! te i!! *" 1 ^ i V ntt ? Cr ^ bit rate of the data channel in units of 400 bits/s. The bit 



assumed I to be constant for the entire s^uotce. Theac^ual bit'rate "is ro^^up'^n^rmuntoteo^ 
JS5 j * a bU rate of 830 100 bi ** would be rounded up to 830 400 }SSSSS£i 

If all 18 bits are 1 then the bitstream is intended for variable bit rate operation. The value zero is forbidden. 

For constant bit rate operation, the bit rate is used by the decoder in conjunction with the vbv delav 
^ m ^™ !" e P« header «o maimain synchronization of the decoder with a constant rate'data channel 
If the stream ,s multiplexed using ISO/IEC 11172-1, the time-stamps and system clockreSSS 
mformation defined in ISO/IEC 11172-1 provide a more appropriate too. fc^oSg^Sction. 

D.5.1.7 Marker bit 

Tte Wtrate is followed by a single reserved bit which is always set to 1. This bit prevents emulation of 
D.5.1.8 VBV buffer size 

m^'imitsof W^K~o'rMsifi!»j* 8i ^ng the minimum required size of the input buffer in me model decoder 
?l75d ¥££XZ?F!}£S& F ° r examp,e> a buffer Size of 20 wouId an input buffer of 20 

, tu 960 ^ Decoders may provide more memory than this,but if they 
provide less they wdl probably run into buffer overflow problems while the sequence is beiog decoded. 

D.5.1.9 Constrained Parameter flag 

If certain parameters specified in the bitstream fall within predefined limits, then the bitstream is called a 
constrained parameter bitstream. Thus the constrained parameter bitstream is a standard of performance 
giving guidelines to encoders and decoders to facilitate the exchange of bitstreams. 

The bitrate parameter allows values up to about 100 Mbits/s, but a constrained parameter bitstream must 
have a bit rate of 1,856 Mbits/s or less. Thus the bit rate parameter must be 3 712or fess mUSt 

The picture rate parameter allows picture rates up to 60 pictures/s, but a constrained parameter bitstream 
must have a picture rate of 30 pictures/s or less. r onsiream 

The resolution of the coded picture is also specified in the sequence header. Horizontal resolutions up to 
"11? *™* ^ ** ***** but ta » constrained parameter bitstream the resolution is limited to 
768 pels or less. Vertical resolutions up to 4 095 pels are allowed, but that in a constrained parameter 
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bitstream is limited to 576 pels or less. In a constrained parameter bitstream, the total number of 
macrobloeks per picture is limited to 396. Ibis sets a limit on (be maximum area of the picture which is 
only about one quarter of the area of a 720x576 pel picture. In a constrained parameter bitstream, the pel 
rate is limited to 2 534 400 pels/s. For a given picture rate, this sets another limit on the maximum area 
of the picture. If the picture has the maximum area of 396 macrobloeks, then the picture rate is restricted to 
25 pictures/s or less. If the picture rate has the maximum constrained value of 30 pictures/s the maximum 
area is limited to 330 macrobloeks. 

A constrained parameter bitstream can be decoded by a model decoder with a buffer size of 327 680 bits 
without overflowing or underflowing during the decoding process. The maximum buffer size that can be 
specified for a constrained parameter bitstream is 20 units. 

A constrained parameter bitstream uses a forward JLcode or backward _/_code less than or equal to 4. Ibis 
constrains the maximum range of motion vectors that can be represented in the bitstream (see table D.7). 

If all these conditions are met, then the bitstream is constrained and the constrained .parameters Jlag in the 
sequence header should be set to 1. If any parameter is exceeded, the flag shall be set to 0 to inform 
decoders that more than a minimum capability is required to decode the sequence. 

D.5.1.10 Load Intra quantizer matrix 

This is a one-bit flag. If it is set to 1, sixty-four 8-bit integers follow. These define an 8 by 8 set of 
weights which are used to quantize the DCT coefficients. They are transmitted in the zigzag scan order 
shown in figure D.30. None of these weights can be zero. The first weight must be eight which matches 
the fixed quantization level of the dc coefficient 

If the flag is set to zero, the intra quantization matrix must be reset to the following default value: 

8 16 19 22 26 27 29 34 

16 16 22 24 27 29 34 37 

19 22 26 27 29 34 34 38 

22 22 26 27 29 34 37 40 

22 26 27 29 32 35 40 48 

26 27 29 32 35 40 48 58 

26 27 29 34 38 46 56 69 

27 29 35 38 46 56 69 83 

Figure D.15 - Default intra quantization matrix 

The default quantization matrix is based on work performed by ISO/IEC JTC1 SC29AVG10 (JPEG) [6]. 
Experience has shown that it gives good results over a wide range of video material. For resolutions close 
to 350x250 there should normally be no need to redefine the intra quantization matrix. If the picture 
resolution departs significantly from this nominal resolution, then some other matrix may give perceptibly 
better results. 

The weights increase to the right and down. This reflects the human visual system which is less sensitive 
to quantization noise at higher frequencies. 

D.5.1.11 Load non-intra quantizer matrix 

This is a one-bit flag. If it is set to 1, sixty-four 8-bit integers follow in zigzag scan order. None of these 
integers can be zero. 

If the flag is set to zero, the non-intra quantization matrix must be reset to the following default value 
which consists of all 16s. 
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16 16 16 16 16 16 16 16 

16 16 16 16 16 16 16 16 

16 16 16 16 16 16 16 16 

16 16 16 16 16 16 16 16 

16 16 16 16 16 16 16 16 

16 16 16 16 16 16 16 16 

16 16 16 16 16 16 16 16 

16 16 16 16 16 16 16 16 



Figure D.16 .. Default non-intra quantization matrix 
D.5.1.12 Extension data 

Tins start code is byte-aligned and is 32 bits long. Its value is 

hex: 00 00 01 B5 

binary: 0000 0000 0000 0000 0000 0001 1011 0101 

iSSlSSS^tSSS^ If il 45 P***' " will be followed by an undetermined 
ZSSSSI^n T^l ^ data bytes are reserved for fu tureeSons 

D.5.1.13 User data 

li^a^ 

hex: 000001 B2 

binary: 0000 0000 00000000 0000 0001 10110010 

D.5.2 Group of pictures 

alS^ to y tS5S P,CtUrcS 1)6 mterSperSed eacb of I or P-pictures. and may 

2^^*tt£!££r* "» - d -* * snowed by 

^estlTb « p,cture - ^ smal,est group of pictures ««*» 0f a sin « le ^ 
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Tlie onginal concept of a group of pictures was a set of pictures that could be coded and displayed 
mo^pendentiy of any other group. In the final version of this part of ISO/DEC 1 1 172 this is not always 
mi£ and any B-pictures preceding (m display order) the first I-picture in a group may require the last picture 
l?h P ™°T m ^ 10 * Nevertheless encoders can still cons Ja^Ups of pictrnS^ 

which are independent of one another. One way to do this is to omit any B-pictures preceding the firstl- 
P^JJ^er way is to allow such B-pictures, but to code them using oily backward motioT 

Property 3, From a coding point of view, a concisely stated property is that a group of pictures begins 
header or at the end of sequence, whichever comes first ^ 
Some examples of groups of pictures are given below: 



I 
I 
I 



P 

B 



B B 
B B 
B I 



P 

P B P 

I B P B P 

I B B P B B 

B B B B P B 



P B B 
I B B 



P 
I 



Figure D.17 - Examples of groups of pictures in display order 

TJese^examples illustrate what is possible, and do not constitute a suggestion for structures of groups of 

Group of pictures start code 

SlStaf ^S^uJtf^ Stam WiUl Gf0UP ° f PiCtUreS Start ™ s code is byte-aligned and is 

* 

hex: 00 00 01 B8 

binary: 0000 0000 0000 0000 00000001 1011 1000 

It may be preceded by any number of zeros. The encoder may have inserted some zeros to get byte 
SSn? have inserted ^diUonal zeros to prevent buffer underflow. An editor may have inserted 
zeros in order to match the vbv_delay parameter of the first picture in the group. 

Time code 

A time code of 25 bits immediately foUows the group of piaures start code. This encodes the same 
information as the SMPTE time code f4]. 5dUlc 

The time code can be broken down into six fields as shown in table D.4. 

Table D.4 « Time code fields 



FIELD 


BITS 


VALUES 


Drop frame flag 


1 




Hours 


5 


0to23 


Minutes 


6 


0to59 


Fixed 


1 


1 


Seconds 


6 


0to59 


Picture number 


6 


0 to 60 



™ f l*°! Kie ^^^ P ,clure in ^up in display order, i.e. the first picture with a temporal 
S 2^. SM *™ «** is to provide a video time identification to appbcaSns. 

It may be disronunuous Tbe presentation time-stamp in the System layer (Part 1) has a much higher 
prectsion and identifies the time of presentation of the picture 
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Closed GOP 



A one bit flag follows the time code. It denotes whether the group of pictures is open or closed Closed 

™ * ^™ ^ded pictures of the previous group for motion coSpensS 

whereas ooen erouns reamre. «nnh nirtnmc #^ k» o»«si«ki« *^ ' 



P 



whereas open groups require such pictures to be available. 
A typical example of a closed group is shown in figure D.18& 

IBBPB BPBBPBB 

0 1 2 .3 4 5 6 7 8 9 10 11 12 

(a) closed group 

BBIB BPBBPBBPBBP 
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 

(b) open or closed group 

Figure D.18 - Example groups of pictures in display order 

A less typical example of a closed group is shown in figure D.18b. In this example, the B-pictures which 
V™*rt r ™ I ;P IC,ure must use backward motion compensation only, i.e. any motion compensation 
must be based only on picture number 2 in the group. 

2!£2SdS P ?" 8 i l^i tO0 ^ the group is opea 7,16 ^t B-pictures that precede the first I-picture 
m the group may have been encoded using the last picture in the previous group for motion compensation. 

Broken link 

A one bit flag follows the closed_gop flag. It denotes whether the B-pictures which precede the first I- 
picture m the GOP can be correctly decoded. If it is set to 1, these pictures cannot be correctly decoded 
because the I-picture or P-picture from the previous group pictures that is required to form the predictions is 
n ^ailab f (pr«umably because the preceding group of pictures has been removed by editing) The 
decoder wdl probably choose not to display these B-pictures. 

If the sequence is edited so that the original group of pictures no longer precedes the current group of 
pictures then this flag normally will be set to 1 by the editor. HoweVer. if the closed_gop S ie 
currentgroup of pictures is set, then the editor should not set the brokenjink flag. Because the group of 
pictures is closed, the first B-pictures (if any) can still be decoded correctly. 

Extension data 

This start code is byte-aligned and is 32 bits long. Its value is 

hex: 000001 B5 

binary: 0000 0000 0000 0000 0000 0001 1011 0101 

It maybe preceded by any number of zeros. If it is present then it will be followed by an undetermined 

to^n^ ^fWr^T^ 1 ^ These data bytes are reserved for future extensions 

^™^ ? ! ?2> and ? h0uId n0t te generated * encoders - MPEG vide ° Coders should have 

the capability to discard any extension data found. 

User data 

lonT'lte JatoTif* f ° Ut>W extension data " This start "* is byte-aligned and is 32 bits 

hex: 000001 B2 

binary: 0000 00000000 00000000 0001 10110010 

It may be preceded by any number of zeros. If it is present then it will be followed by an undetermined 
number of data bytes terminated by the next start code. These data bytes can be used by the encoder for any 
purpose. The only restriction on the data is that they cannot emulate a start code, even if not byte-aligned 
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This means that a string of 23 consecutive zeros must not occur. One way to prevent emulation is to force 
the most significant bit of alternate bytes to be a 1. 

fa closed encoder-decoder systems the decoder may be able to use the data, fa the more general case, 
decoclers should be capable of discarding the user data. 

D.5.3 Picture 

S!r^ UB r^ yCr ^ ntain f aUthe informauon for one picture. The header identifies the temporal 
reference of the picture, the picture coding type, the delay in the video buffer verifier (VB V) and. if 
appropriate, the range of motion vectors used. 

D.5.3.1 Picture header and start code 

tS^S ^ gi f With , apk ? Ure header - ^ ^ader starts with a picture start code. This code is byte-aligned 
and is 32 bits long. Its value is: •v<~-«"5"«» 

hex: 00 00 01 00 

binary: 0000 0000 0000 0000 0000 0001 0000 0000 
It may be preceded by any number of zeros. 
D.5.3.2 Temporal reference 

^7^^ Reference is a ten-bit number which can be used to define the order in which the pictures 
must be displayed. It may be useful since pictures are not transmitted in display oider, but rather in the 
t^J^.d * e decoder n feds to decode them. The first picture, in display order, in each group must have 
Temporal Reference equal to zero. This is incremented by one for each picture in the group. 

Some example groups of pictures with their Temporal Reference numbers are given below: 



Example (a) in 


I 


B 


P 


B 


P 


















display order 


0 


1 


2 


3 


4 


















Example (a) in 


I 


P 


B 


P 


B 


















decoding order 


0 


2 


1 


4 


3 


















Example (b) in 


B 


B 


I 


B 


B 


P 


B 


B 


P 


B 


B 


P 




display order 


0 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 




Example (b) in 


I 


B 


B 


P 


B 


B 


P 


B 


B 


P 


B 


B 




coded order 


2 


0 


1 


5 


3 


4 


8 


6 


7 


11 


9 


10 




Example (c) in 


B 


I 


B 


B 


B 


B 


P 


B 


I 


B 


B 


I 


I 


display order 


0 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


Example (c) in 


I 


B 


P 


B 


B 


B 


B 


I 


B 


I 


B 


B 


I 


coded order 


1 


0 


6 


2 


3 


4 


5 


8 


7 


11 


9 


10 


12 



Figure D.19 Examples of groups of pictures and temporal references 

If there are more man 1024 pictures in a group, then the Temporal Reference is reset to zero and then 
increments anew. This is illustrated below: 

a f I ? ? * - P B B P ... P B B P displayorder 
0 1 2 3 4 5 ... 1022 1 023 0 1 ... 472 473 474 475 * pBy0nier 

Figure D.20 - Example group of pictures containing 1 500 pictures 
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D.5.3.3 Picture coding type 

^ bit n^ber follows the temporal reference. This is an index into the following table defining the 

Table D.S - Picture types 



ESEa 




000 


Forbidden 


001 


I-picture 


010 


P-picture 


011 


B-picture 


100 


D Picture 


101 


Reserved 


no 


Reserved 


111 


Reserved 



JjoSSSa^ CodeslOlthroughlllare.eservedfor future 
^ and l^n K. . 11 172. Decoders should be capable of discarding all pictures of this 

never be 

D.5.3.4 VBV delay 

The buffer fullness is not specified in bits but rather in units of time. The vbv delav is a lfi-hit n,m,iw 

For example, suppose the vbv.delay had a decimal value of 30000, then the time delay would be: 

D = 30 000 / 90 000= 1/3 s 
tfthe channel bit rate were 1,2 Mbits/s then the contents of the buffer before the picture is decoded would 

B = 1 200 000 / 3 = 400 000 bits 

The meaning of vbv.delay is undefined for variable bit rate operatioa 
D.5.3.5 Full pel forward vector 

Thfetog is present only in the headers of P- P ictures and B-pictures. It is absent in I-pictures and D 
D.5.3.6 Forward f-code 

"^tSS^SZ "* Jf*.*? fU " ^ f0rward vector is Present only in the headers ofP- 
ma^S?Eorwn:iT for the coded forward vectors and cm L the 

SoTSi ^ ^ * COded - ^ can take only values of 1 through 7; a value of 
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2°fST 086(1 * deC ° ding ^ fOTWard motion vectors 305 deriv «J from fonvard_f_code. fomard_r_size 

The fonvard_r_size is one less than the fomard_f_code and so can take values 0 through 6. 
The forwardLf parameter is given by table D.6: 

Table D.6 — f_codes 



forward/backward f code 


forward/backward f 


1 


1 


2 


2 


3 


4 


4 


8 


5 


16 


6 


32 


7 


64 



D.5.3.7 Full pel backward vector 

This is a one bit flag giving the precision of the backward motion vectors. If it is 1 then the precision of 
the vectors is m integer pels, if it is zero then the precision is half a pel. Thus if the tog iTsKo^ me 
vectors have twice the range than they do if the flag set to zero. 

This flag is only present in the headers of B-pictures. It is absent in I-pictures, P-pictures and D pictures. 
D.5.3.8 Backward f-code 

^^t^l"™^^ Uke ^ e / uU I* 1 back ^rd vector flag, is present only in the headersof B- 

The backward J" parameter is derived from the backward_f_code and is given by table D.6 
D.5.3.9 Extra picture information 

SSnf SSSS'w^ DeXt ^ in picture header - num ^ of information bytes may be 
present An information byte is preceded by a flag bit which is set to 1 . Information bvtes are therefore 

EfilSSZ^ nT infonnation ^ * Stowed by a «m wPK £££ 
SSf.f 8 °: that ^ no ^onnation bytes. The largest size is unlimited. The fouowfo* 
example has 16 bits of extra information denoted by E* merouowing 

1EEEEEEEE1EEEEEEEE0 
Where E is an extra information bit 

oJSSvSs^ "172. The meaning 

D.S.3.10 Extension data 

This start code is byte-aligned and is 32 bits long. Its value is: 

hex: 00 00 01 B5 

binary: 0000 0000 0000 0000 0000 0001 10110101 

iHS^SSS^t nUm !f r of ^ros Ifitis present then it will be followed by an undetermined 
T^ZTlSO^^Tlnl t T St £ ^ ^ bytes are reserved for future exteSns 
c^te^dSSli 7 ' ^ *"* n0t te generated b * encoders - MPEG video decoders must be 
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D.5.3.11 User data 



This start code is byte-aligned and is 32 bits long. Its value is 

hex: 00 00 01 B2 

binary. 0000 0000 0000 0000 0000 0001 10110010 

lT^f!^^ by . mynm ^t TO i 2ero$ - ffit ^ P«sent *en it will be followed by an undetermined 
l^Z t? t ! mUnated b l •"enextstattcode. Tnese data bytes can be used by the encoder for any 
pwpose. The oidy restriction on the data is that they cannot emulate a start code, even if not byte-alimed 
One way to prevent emulation is to force the most significant bit of alternate bytes to be a l 

^ < ^ fcCOd ^ $V fT S decoder m y be able to use the data. In the more general case 
decoders should be capable of discarding the user data. ' 



D.5.4 



Slice 



SS^SL ^^S'* 8 - * s,, . ce . C0Ds| s«s of an integral number of macioblocks in raster scan 
""^ j™ ^ of ^erent sizes within a picture, and the division in one picture need not be the same 
as the division in any other picture. Slices can begin and end at any macroblock in a picture subject to the 
foUo^g resections. The first slice must begin at the top left of die picture, and the end of the last sl£ 
mus be the bottom right macroblock of the picture. There can be no gaps between slices L easees 

ofmSobS nUmber ° f SKCeS ^ 3 iS ° ne ' ** max ^ number * S number 

S?sSiS * 5,h ^ ice start code, the exact value of which defines the vertical position of the slice. 
JSESSF ^^f^^s^sthequantizationstep-size. At the start of each slice the predictors Tot 
,t J^SSf? VSdueSaDd ? C ? redk:t0,s for ^ vector decodm 8 » 311 The horizon^ position of 
5?.^ I"? $ ^ " g, y eB by tte macroblock of the first macroblock in the slice. The result of 
aU this is that, within a picture, a slice can be decoded without information from the previous slices 
Therefore, if a data error occurs, decoding can begin again at the subsequent slice. 

^tfjf. to be used in an error free environment, then one slice per picture may be appropriate. If the 
environment is noisy, then one slice per row of macroblocks may be more desirable, as showi in figure 



1 begin 


endl 


2 begin 


end 2 


3 begin 


end 3 


4 begin 


end 4 


5 begin 


end 5 


6 begin 


end 6 


7 begin 


end 7 


8 begin 


end 8 


9 begin 


end 9 


10 begin 


end 10 


11 begin 


end 11 


12 begin 


end 12 


13 begin 


end 13 



Figure D.21 Possible arrangement of slices in a 256x192 picture 
In this figure and in the next, each strip is one macroblock high, i.e. 16 pels high. 
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n S 40 bltSl ^ is some f <* deluding more than the minimum 

number of slices. For example, a sequence with a vertical resolution of 240 lines coded at 30 pictures/s 

= IStlOObits/swimonesLceperrow.anadditionaloverheadof 16800bits/s. Thecaku^is 
TffZZ*^ underesamates the impact, since the inclusion of a slice imposes additional requirements 
feu the macroblock immed.ately before the slice header be coded, as well a/ttefirst nWbS™ 

SSr" 8reat fa diWdin « a " icture "P fato s,ices - °° e Po^ble arrangement 



in 



end 2| 3 begin" 



endl|2 begin 



end 3 



1 4 begin 



end 4 1 5 begin 



ends 



6 begin 



end 7| 8 s| 9 begin end 9| 10 begin 



end 6 7 begin 



end 10 



Figure D.22 - Possible arrangement of slices in a 256x192 picture 

^^Z^cS. 8lVen mU$tratiVe PU,P0SCS ° nly * 11 " ^ - 3 « h0W tO 

D.5.4.1 Slice header and start code 

2X2 2£!?! h 3 sU £?f der -. Slice ^ ^ a slice start «xle- This code is byte-aligned 
Si? * ° • 8 : Jf* C,ght bite 0311 take on a range of values which define the vertical positS, of 
the slice in the picture. The permitted slice start codes are: ^ 01 

hex: fiom 00 00 01 01 

to 00 00 01 AF 

binary: from 0000 0000 0000 0000 0000 0001 0000 0001 

to (KX» 0000 0000 0000 0000 0001 1010 1111 

Each slice start code may be preceded by any number of zeros. 

™ JSiw* ?k **? give ^ slice vertical P 05 * 0 ". ie the vertical position of the first 

macroblock in the slice in unite of macroblocks starting with position 1 at the top oHhe picture. A useful 
vanabte ^macroblock row. This is similar to slice vertical position except that row 0 is at thetop ofttT 



slice vertical position = macroblock row + 1 

For example, a slice start code of 00000101 hex means that the first macroblock in the slice is at vertical 
position 1 or macroblock row 0. i.e. at the top of the picture. A slice start code of 00000120 h«^S 
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that the fust macroblock is at vertical position 32 or macroblock row 31, i.e. at toe 496th row of Dels It 
is possible for two or more slices to have the same vertical position. ^ 

JSxfS'SSoTls^ 1,051110,1 ^ " nitS ' A SUCe Witt> 11,18 P ° Siti0n W0Uld ^"^ a vertical size of 



The horizontal position of the fust macroblock in the slice can be calculated from its macroblock address 
increment "Thus its position in the picture can be determined without referring Z »ms£? 
macroblock. "Thus a decoder may decode any slice in a picture without havmg 1^1^ ^ s £Tn the 

SS^u^ 
0.5.4.2 Quantizer scale 

^ie quantizer scale is a five-bit integer which is used by the decoder to calculate the DCT coefficients from 
b^S3 q fS v r ffid<a,S - A *** <** " bidden, so die quantizer J££Z^£? 

Note in addition that the quantizer scale may be set at any macroblock. 
D.5.4.3 Extra slice information 

Etna slice information forms the last field in the slice header. Any number of information bytes may be 
present An information byte is preceded by a flag bit which is set to 1. Infonnation bytes are therefore 
generally nol ; byte-aligned. The last information byte is followed by a zero bit The smallest size of this 
tield is therefore one bit a 0, that has no infonnation bytes. The largest size is unlimited. The followinc 
example has 24 bits ofextra infonnation denoted by E: • g 

1EEEEEEEE1 EE EE EE EE 1 EEEEEEEEO 

The extra information bytes are reserved for future extensions to this part of ISO/EEC 11 172. The meaning 
of these bytes is currently undefined, so encoders must not generate such bytes and decoders must discard 
Ine ni. 

Hie slice header is followed by code defining the macroblocks in the slice. 
D.5.5 Macroblock 

Slices are divided into macroblocks of 16 x 16 pels. Macroblocks are coded with a header that contains 
mfonnation on the macroblock address, macroblock type, and the optional quantizer scale. The header is 
followed by data defining each of the six blocks in the macroblock It is convenient to discuss the 
macroblock header fields in the order in which they are coded 

D.5.5.1 Macroblock stuffing 

Tne first field in the macroblock header is "macroblock stuffing". This is an optional field, and may be 
inserted or onutted lati the discretion of the encoder. If present it consists of any number of 11-bit strings 
with the pattern "0000 0001 111". This stuffing code is used by the encoder to prevent underflow, and is 
discarded by the decoder. If the encoder determines that underflow is about to occur, then it can insert as 
many stuffing codes into the first field of the macroblock header it likes. 

Note that an encoder has other strategies to prevent buffer underflow. It can insert stuffing bits immediately 
before a start code. It can reduce the quantizer scale to increase the number of coded coefficients. It can even 
start a new slice. 

D.5.5.2 Macroblock address increment and macroblock escape 

Macroblocks have an address which is the number of the macroblock in raster scan order. The top left 
macroblock in a picture has address 0, the next one to the right has address 1 and so on. If there are M 
macroblocks in a picture, then the bottom right macroblock has an address M-l. 
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The address of amacroblock is indicated by transmitting the difference between the addresses of the current 
macroblock and the previously coded macroblock. This difference is called the macroblock address 
increment. In I-pictures, all macroblocks are coded and so the macroblock address increment is nearly 
always one. There is one exception. At the beginning of each slice the macroblock address is set to that of 
the right hand macroblock of the previous row. At the beginning of the picture it is set to -1 . If a slice 
does not start at the left edge of the picture, then the macroblock address increment for the first macroblock 
in the slice will be larger than one. Fbr example, the picture of figure D22 has 16 macroblocks per row. 
At the start of slice 2 the macroblock address is set to 15 which is the address of the macroblock at the right 
hand edge of the top row of macroblocks. If the first slice contained 26 macroblocks, 10 of them would be 
in the second row, so the address of the first macroblock in slice 2 would be 26 and the macroblock address 
increment would be 1 1 . 

Macroblock address increments are coded using the VLC codes in the table in B.l. 

It can be seen that there is no code to indicate a macroblock address increment of zero. This is why the 
macroblock address is set to -1 rather than zero at the top of a picture. The first macroblock will have an 
increment of one making its address equal to zero. 

The macroblock address increments allow the position of the macroblock within the picture to be 
determined. For example, assume that a slice header has the start code equal to 00 00 01 OA hex, that the 
picture width is 256 pels, and that a macroblock address increment code 00001 1 1 is in the macroblock 
header of the first macroblock in the slice. A picture width of 256 pels implies that there are 16 
macroblocks per row in this picture. The slice start code tells us that the slice vertical position is 10, and 
so the macroblock row is 9. The slice header sets the previous macroblock address to the last macroblock on 
raw 8, which has address 143. Hie macroblock address increment VLC leads to a macroblock address 
increment of 8, and so the macroblock address of the first macroblock in the slice is 143 + 8 = 151. 

The macroblock row may be calculated from the address: 

macroblock row = macroblock address / macroblock width 

=151/16 

9 

The division symbol signifies integer truncation, not rounding. 

The macroblock column may also be calculated from the address: 

macroblock column = macroblock address % macroblock width 
= 151 % 16 
= 7 

Columns are numbered from the left of the picture starting at 0. 
There are two special codewords: escape and stuffing. 

Hie escape code means "add 33 to the following macroblock address increment". This allows increments 
greater than 33 to be coded. For example, an increment of 40 would be coded as escape plus an increment 
of7: 

0000000100000010 

An increment of 70 would be coded as two escape codes followed by the code for an increment of 4: 

0000 0001 0000 0000 0010 000011 

The stuffing code is included since the decoder must be able to distinguish it from increment codes. It is 
used by the encoder to prevent underflow, and is discarded by the decoder. 
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D.5.5.3 Macroblock types 

Each* of the picture types I, P, and B, have theirown macroblock types. See, respectively D6 3 D64 
and D.6.5 for the codes and their descriptions. ^ y ' u o ^ u o ^ 

D.5.5.4 Motion horizontal/vertical forward/backward codes 

The interpretation of these codes is explained in D.6.2.3. 

D.5.5.5 Motion horizontal/vertical forward/backward R 

The interpretation of these codes is explained in D.6.2.3. 

D.5.5.6 Coded block pattern 

SeiSStS WithiD ^ ^ ^ tranSmitted - ^ of 
D.S.5.7 End of macroblock 

Ibis code is used only in D-pictures and is described in D.6.6. 

D.5.6 Block 

A block is anarrayof 8 by 8 component pel values, treated as a unit and input to the Discrete Cosine 
Transfonn ff)CT) Blocks of 8 by 8 pels are transformed into arrays of 8 by 8 DCT coefficients using the 
two dimensional discrete cosine transform. 8 me 

D.6 Coding MPEG video 

D.6.1 Rate control and adaptive quantization 

Tbe encoder must control the bit rate so that the model decoder input buffer neither overflows nor 
!^*™,° WS " S»>ce the model decoder removes all the bits associated with a picture from the input buffer 
SSSTSi ' S 10 COnBO ' only me number of bits P«- P^- The encoder should 

? ?Sf total " l ! m . be ! S of b,ls a "» 0B * me various ^ of Pictures so that the perceived quality is 

^^typ^ 

ss^XiSr number of biu avaiiabie ^ ^ » 

£ C ^ C !^ by ^ i0b . an 6o . ntro,s ^ bitrate « to vary the quantizer scale. Ibis is set in each slice 

header and may J* set at the beginning of any macroblock, givmg me enc^r excellent conux,lov« the bit 
rate witnin a picture. 

D.6.1. 1 Rate control within a sequence 

For a typical coding scheme represented by the following group of pictures in display order 

BBIBBPBBPBBPBBP 

it has been found that good results can be obtained by matching the visual quality of the I and P-pictures 
and by reducing the code size of the B-pictures to save bits giving a generally lower quality for the B- 
pictures. 

Tbe best allocation of bits among the picture types depends on the scene content Work of the MPEG 

T^^T TV? ggeStS al,0tting P - pictures about 2 " 5 *"» « many bits as B-pictures, and allotting 

nS^SlSS^n^ P ' PiCtUn5S 8iVCS g00d KSUl<S for typical « scenes- * there is 8 
little motion or change in the video, then a greater proportion of the bits should be allotted to the I-pictures. 
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L^g* gTven rXlcS ^ Pr0P ° rti0n ^ 10 * «* ™" cf 

^^oS^se^ start w,,h ^ foregoing ^ bi * dynamicai ^ 

D.6.1.2 Rate control within a picture 

' * headi " g * Wanl 0verflow ' ^ < > uan,i2er scale should be increased. If this action is not 
sufitoent to prevent an impending overflow to, as a last resort, the encoder could discarf S f^uencv 

artifacts in the decoded video, it would in no way compromise the validity of the coded bitstream. 

^A h jf a * beading ,a ^ ui underflow - ^antizer scale should be reduced. If this is not sufficient, the 
encoder can insert macroblock stuffing into the bitstream, or add leading zeros to start codes 

Under normal ckcamstances, the encoder calculates and monitors the state of the model decoder buffer and 
changes the quantizer scale to avert both overflow and underflow problems. 

Stff ^K? ,n ^ h , elpS accom P Ush this «s to monitor the buffer fullness. Assume that the bits 
EfJtfff . S^JT g ^ vanous P icturc tyi* 55 - and that an average quantizer scale for each picture tvne 
^i^ftSfS 64 ^ e «** buffer fullness at any macroblock in" picture can be calcuS and^ ^ 
cmpared I with the nonunal fullness, Le. the value mat would be obtained if the bits were uniformly 
distributed among all the macroblocks in the picture. If the buffer fullness is larger than the nominal value 

tua^TT Sh ° Ul f ^ *™&> whereas if the buffer hJ S Z * 

the nominal, the quantizer scale should be set lower than the average. 

^^S^JS^ilK^^ for agiven n "°»ber of coding bits, the total mean 

^ of the ^ x,ed P'Chire wdl tend to be close to the minimum. However, the visual appearance of 
™ * "Jfroved by varying the quantizer scale over the picture, making it snX^SSh 

^ *£%?Z"?.'* r * r ™ ™* tecbnique reduces the visibility of blockiness in sSoST 

hnagVdeSl mCrcased <I"*>to«*» noise in the busy areas where, however, it is masked by the 

SSSfr^T 11 ^ mC * itiate within a P 5cture ad j uste ^ 1"^^ scale depending on 
an^lS*^ 

D.6.1.3 Buffer fullness 

to°sSeS>S VjSUal qUalky, e " COdCf Sh ° Uld 31)1,051 fiU 1,16 input buffer before instnictin g «>e decoder 

D.6.2 Motion estimation and compensation 
D.6.2.1 Motion compensation 

JShS™, ~ m P en sation to exploit temporal redundancy in the video. Decoders construct a 
n ^ °r * . ^ 1)6,5 ID a Pre^usly transmitted picture. Motion within the pictures (e g a 
SJStS? ? e ^ fa .^.P«vious picture wUl be in a different position from me p?k in the 
current block, and the displacement is given by motion vectors encoded in the bitstream. The predicted 
^n,rf y /rr 8 eS ^r te * *" Current Wock ' il is usually more efficient to transmit the motion 
uSSLS btilyTS ° l0Ck ^ ^ CU,TEnt ^ » <™*mit a ctesc^tion S 

Consider the following typical group of pictures. 
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Figure D.23 - Group of pictures in display order 

SS^!,? figUre D 23 ^ forward moUon compensation from picture 5. P-pictures always use 
forward mouon compensation from the last transmitted I or P-picture. r piC rures always use 

SSSSli^ ^, m0ti ° n ^ mpenSati0n fr0ln ** P revious 1 « p -Pi«««5. from the next (in display order) I 
or P-picture, or both; i.e.. from the last two transmitted I or P-pictures. P y } 

f( TT 1 if reference U made 10 a in the past and called backward if reference is 

SwaStion^rSr^SnTH """P^ 0 " 60111 P -P icture 5 - B-Pictures may use both fomard and 
SSST ^ md aVCragC "* resu,t ™» °P eration * called interpolate motion 

t JTr S , : ° f m0ti0n ^Pensation are useful, and typically are used in coding B-pictures 
D.6.2.2 Motion estimation 

Motioi i compensation in a decoder is straightforward, but motion estimation which includes detenrunirui the 
tatm yectors and which must be performed by the encoder, presents a fomidaD^pS" 8 



SS^^^^lf 100 ^- ^ more ^P^uonally intensive methods tend to give 
<££ S>^Sity^ S trade ° ff °>ade in the encoder computational power, and hence cost, veTsus 
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Using a search strategy the encoder attempts to match the pels in a macroblock with those in a previous or 
future picture. The vector corresponding to the best match is reported after the search is completed. 

0.6.2.2.1 Block matching criteria 

fa seeking a niatch the encoder must decide whether to use the decoded past and future pictures as the 

° f f 6 ° nga ^ m md ^ pictures - motion estimation, use of ihe decoded pictures by 
^rl^r*^ Sm 2^ T 0t m , ^ Crr0r picture ' whereas use of °* °*gfrd Pi«"^s gives the most 
-T ^'.k ch0,c «. de P en ' ,s °- whether the artifacts of increased noise, or greater spurious 
Xl^JSSPS te ¥ ^ objectionable. There is usually little or no difference in quality between 
the two methods Note that the decoder does not perform motion estimation. It performs motion 
compensated predion and intenwlation using vectors calculated in the encoder and stored in the bitstream 
In motion compensated prediction and interpolation, both the encoder and decoder must use the decoded ' 
pictures as the references. 

^ZiS 1 ^ 8 avaflable - ™ e mean square error of the difference between the motion- 

b,0Ck v! Dd block * an obvious choice. Another possible criterion is the mel 

absolute difference between the motion-compensated block and the current block. 

For half pel shifts, the pel values could be interpolated by several methods. Since the decoder uses a simple 
linear mterpolauon, there is little reason to use a more complex method in the encoder. The linear 
interpolation method given in this part of ISO/IEC 11172 is equivalent to the following. Consider four 
pels having values A, B,D and E as shown in GgureD.24: 

A h B 

v c 

D E 

Figure D.24 - Interpolation of half pel shifts 
The value of the horizontally interpolated pel is 

h = (A + B)//2 

where the double division symbol means division with rounding to the nearest integer. Half integer values 
aretoberoundedtothenexthighervalue. Thusif A = 4andB = 9thenh = 6.5 wLh is rourSup to? 

The value of the vertically interpolated pel is 

v = (A + D)//2 

The value of the central interpolated pel is 

c = (A + B + D + E)//4 

D. 6.2.2. 2 Search range 

S™^ k n ^ l f8 criterion has been selected, some kind of search strategy must be adopted. This 
must recogmze the limitations of the Vl£ tables used to code the vectors. The maximum range of me 
vector depends upon forward_f_code or backwarfJLcotte. The motion 
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Table D.7 -- Range of motion vectors 



forward_f_codeor 


I Motion vector range 


backwardJLcode 


1 ful]_pefc=0 


full_j)eh=l 


1 j 


1 -810 7,5 


| -16 to 15 


2 1 


! -16 to 153 


-32 to 31 


3 ! 


-32 to 313 


-64 to 63 


4 \ 


! -64io633 


-128 to 127 


5 | 


-128 to 1273 


-256 to 255 


6 j 


-256 to 2553 


-512 to 511 


7 i 


I -512 to 511.5 


-1 024 to 1 023 



Hie range depends on the value of fuUj>el_forward_vector or fidl ^peLbackward^vector in the picture 
header Thus if all the motion vectors were found to be 15 pels or less, the encoder would usually select 
half pel accuracy and a forward_f_code or backward J_code value of 2. 

The search must be constrained to take place within the boundaries of the decoded reference picture. Motion 
vectors which refer to pels outside the picture are not allowed. Any bitstream which refers to such pels does 
not conform to this part of ISO/IEC 11172. 

D.6.2.2.3 2-D search strategy 

There are many possible methods of searching another picture for the best match to a current block, and a 
few simple ones will be described. 

Hie simplest search is a full search. Within the chosen search range all possible displacements are 
evaluated using the block matching criterion. 

The full search is computationally expensive, and practical encoders may not be able to afford the time 
required for a full search. 

A simple modification of the full search is to search using only integer pel displacements. Once the best 



integer match has been found, the eighi 
best (me selected as illustrated below: 



neighbouring half-integer pel displacements are evaluated, and the 
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Figure D.25 - Integer pel and half pel displacements 

Assume that the position x+2,y+2 gives the best integer displacement matching using the selected block 
matching criterion, then the encoder would evaluate the eight positions with half pel displacements marked 
by + signs in figure D.25. If one of them were a better match then it would become the motion vector, 
otherwise the motion vector would remain that of ihe integer displacement x+2,y+2. 

If during the integer pel search, two or more positions have the same block matching value, the encoder can 
adopt a consistent tie-breaking rule. 

The modified full search algorithm is approximately an order of magnitude simpler than the full search. 
Using only integer displacements for the first stage of the search reduces the number of evaluations by a 
factor of four. In addition, the evaluations are simpler since the pel differences can be calculated directly and 
do not have to be interpolated. 
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For some applications even the modified full search may be too time consuming, and a fester search method 
may be required. One such method is the logarithmic search. 

D.6.2.2.4 Logarithmic search 

In this search method, grids of 9 displacements are examined, and the search continued based on a smaller 
grid centered on the position of the best match. If the grids are reduced in size by a factor of 3 at each step 
then the search is maximally efficient in the sense that any integer shift has a unique selection path to it 
This method will find the best match only for a rather limited set of image types. A more robust method is 
to reduce the size of the grids by a smaller factor at each step, e.g. by a factor of 2. The scaling factors can 
also be adjusted to match the search ranges of table D.7. 

The method will be illustrated with an example. Consider the set of integer shifts in figure D.26: 
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Figure D.26 - Logarithmic search method for integer pel shifts 

TTie first grid has a spacing of 4 pels. The first step examines pels at shifts of 0, 4, or -4 pels in each 
direction, marked 1 in figure D.26. The best position is used as the center point of the second grid. 
Assume it is the pel marked 1 directly to the left of the center pel. The second grid has a spacing of 2 pels. 
The second step examines pels at shifts of 0, 2, or -2 pels in each direction from the center of the new grid, 
marked 2 in the figure. The best position is used as the center point of the third grid, assume it is the lower 
right pel of the second grid. The third grid has a spacing of 1 pel. The third step examines pels at shifts of 
0, 1, or -1 pels in each direction from the center of the grid. The best position is used as the center point of 
the fourth grid. The fourth grid has a spacing of 1/2 pel. The fourth step examines pels at shifts of 0, 1/2, 
or -1/2 pels in each direction from the center of the grid using the same method as in the modified full 
search. The best position determines the motion vector. 

Some possible grid spacings for various search ranges are given in table D.8. 

Table D.8 « Grid spacings for logarithmic searches 



forwardJLcode 


RANGE 


STEPS 


GRID SPACINGS 


1 


±7,5 


4 


42 11/2 


2 


±15,5 


5 


8 4 2 1 1/2 


3 


±31.5 


6 


! 16842 1 1/2 



For P-pictures only forward searches are performed, but B-pictures require both forward and backward 
searches. Not all the vectors calculated during the search are necessarily used. In B-pictures either forward 
or backward motion compensation might be used instead of interpolated motion compensation, and in both 
P and B-pictures the encoder might decide that a block is better coded as intra, in which case no vectors are 
transmitted. 

D.6.2.2.5 Telescopic search 

Even with the faster methods of the modified full search, or the logarithmic search, the search might be 
quite expensive. For example, if the encoder decides to use a maximum search range of 7 pels per picture 
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interval, and if there are 4 B-pictures preceding a P-picture, then the full search range for the P-picture would 
be 35 pels. This large search range may exceed the capabilities of the encoder. 

One way of reducing the search range is to use a telescopic search technique. This is best explained by 
illustrating with an example. Consider the group of pictures in figure D27. 

I BBBPBBBPBBBP 
0123 456789 10 11 12 

Figure D.27 — Example group of pictures in display order 

The encoder might proceed using its selected block matching criterion and D search strategy. For each P- 
picture and the preceding B-pictures, it first calculates all the forward vectors, then calculates all the 
backward vectors. The first set of pictures consists of pictures 0 through 4. 

To calculate the complete set of forward vectors, the encoder first calculates all the forward voters from 
picture 0 to picture 1 using a 2-D search strategy centered on zero displacement It next calculates all the 
forward vectors from picture 0 to picture 2 using a 2-D search strategy centered on the displacements 
calculated for the corresponding block of picture 1. It next calculates all the forward vectors from picture 0 
to picture 3 using a 2-D search strategy centered on the displacements calculated for the corresponding block 
of picture 2. Finally, it calculates all the forward vectors from picture 0 to picture 4 using a 2-D search 
strategy centered on the displacements calculated for the corresponding block of picture 3. 

To calculate the complete set of backward vectors, the encoder first calculates all the backward vectors from 
picture 4 to picture 3 using a 2-D search strategy centered on zero displacement It next calculates all the 
backward vectors from picture 4 to picture 2 using a 2-D search strategy centered on the displacements 
calculated for the corresponding block of picture 3. Finally, it calculates all the backward vectors from 
picture 4 to picture 1 using a 2-D search strategy centered on the displacements calculated for the 
corresponding block of picture 2. 

Further methods of motion estimation are given by Netravali and Haskell [1]. 
D.6.2.3 Coding of motion vectors 

The motion vector of a macroblock tends to be well correlated with the vector of the previous macroblock. 
For example, in a pan all vectors would be roughly the same. Motion vectors are coded using a DPCM 
technique to make use of this correlation. 

In P-pictures the motion vector used for DPCM, the prediction vector, is set to zero at the start of each slice 
and at each intra-coded macroblock. Note that macroblocks which are coded as predictive but which have no 
motion vector, also set the prediction vector to zero. 

In B-pictures there are two motion vectors, forward and backward. Each vector is coded relative to the 
predicted vector of the same type. Both motion vectors are set to zero at the start of each slice and at each 
intra-coded macroblock. Note that predictive macroblocks which have only a forward vector do not affect 
the value of the predicted backward vector. Similarly, predictive macroblocks which have only a backward 
vector do not affect the value of the predicted forward vector. 

The range of the vectors is set by two parameters. The fulL_peLforward_vector and 
fulLpeLbackward.vector flags in the picture header determine whether the vectors are defined in half-pel or 
integer-pel units. 

A second parameter, forward_f_code or backward^ code, is related to the number of bits appended to the 
VLC codes in table D.9. 
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Table D.9 ~ Differential motion code. 



VLCoode 


II Value 


0000 0011001 


-16 


0000 0011011 


-15 


0000 0011 101 


-14 


0000 0011 111 


-13 


00000100001 


-12 


0000 0100 011 


-11 


0000 010011 


-10 


0000 0101 01 


-9 


00000101 11 


-8 


0000 0111 


-7 


00001001 


-6 


00001011 


-5 


000011 


A 


0001 1 


-3 


0011 


-2 


011 


-1 


1 


0 


010 


1 


0010 


2 


0001 0 


3 


0000110 


4 


00001010 


5 


00001000 


6 


00000110 


7 


0000 0101 10 


8 


0000 0101 00 


9 


0000 0100 10 


10 


0000 0100 010 


11 


0000 0100 000 


12 


0000 0011 110 


13 


0000 0011 100 


14 


0000 0011010 


15 


0000 0011000 


16 



Advantage is taken of the fact that the range of displacement vector values is constrained. Each VLC 
represents a pair of difference values. Only one of the pair will yield a motion vector falling within the 
permitted range. 

?fS e ,° f V 6 VeCt ° r I s 1 r ited to values shown in **> te D -7- The values obtained by decoding the 
fSafshwTSelm ^ "* nU,£e ^ ad<Hn8 ° r SUbtraCting 3 m0dU,US wWch *•"■*" 



Table D.10 - Modulus for motion vectors 



forwaitJJLcode 




or backward f code 


| MODULUS 


1 


32 j 


2 


64 


3 


128 


4 


256 


5 


512 


6 


1024 


7 


2 048 
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The use of the modulus, which refers only to the numbers in tables D.8 through D.10, will be illustrated 
by an example. Assume that a slice has the following vectors, expressed in the units set by the full pel 
flag. 

3 10 30 30-14-16 27 24 

The range is such that an f value of 2 can be used. The initial prediction is zero, so the differential values 
ate 

3 7 20 0-44 -2 43 -3 

The differential values are reduced to the range -32 to +31 by adding or subtracting the modulus 64 
corresponding to the forward J_code of 2. 

3 7 20 0 20 -2-21 -3 

To create the codeword, (mvd + (sign(mvd)*(forward.f-l) ) ) is divided by forward./. The signed quotient of 
this division is used to find a variable length codeword from table D.9. Then the absolute value of the 
remainder is used to generate a fixed length code that is concatenated with the variable length code The 
codes generated by this example are shown below: 



Value H VLC Code 





100100 




00001100 


20 


00000100 101 


0 


1 


20 


0000 0100 101 


-2 


0111 


-21 


000001000110 


-3 


00110 



D.6.3 Coding l-pictures 

In coding I-pictures, the encoder has two main decisions to make that are not mandated by this part of 
ISO/IEC 11 172. These are: how to divide the picture up into slices, and how to set the quantizer scale. 

D.6.3.1 Slices in l-pictures 

Division of the picture into slices is described in D.5.4. 
D.6.3.2 Macroblocks In l-pictures 
D.6.3.2.1 Macroblock types in l-pictures 

Ibere are two types of macroblock in I-pictures. Both use intra coding. One uses the current quantizer 
scale, whereas the other defines a new value for the quantizer scale. They are identified in the coded 
bitstream by the VLC codes given in table D.l 1. 

Table D.ll - Macroblock type VLC for I-pictures (table B.2a.) 



TYPE 


QUANT 


VLC 


.s .s 


1 


1 

01 



The types are referred to names in this annex. Intra-d is the default type where the quantizer scale is not 
changed. Intra-q sets Ihe quantizer scale. 

In order to allow for possible future extension to MPEG video, the VLC for intra-q is 01 rather than 0. 
Additional types could be added to this table without interfering with the existing entries. The VLC table is 
thus open for future additions, and not closed. A policy of making the coding tables open in this way was 
adopted by in developing this part of ISO/IEC 11172. The advantage of future extension was judged to be 
worth the slight coding inefficiency. 
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D.6.3.2.2 Quantizer scale 

If the macroblock type is intra-q, then the macroblock header contains a five-bit integer which defines the 
quantizer scale. This is used by the decoder to calculate the DCT coefficients from the transmitted quantized 
coefficients. A value of 0 is forbidden, so the quantizer scale can have any value between 1 and 3 1 
inclusive. 



Note that also the quantizer scale is set in a slice header. 

If the block type is intra-d, then no quantizer scale is transmitted and the decoder uses the previously set 
value. For a discussion on strategies encoders might use to set the quantizer scale, see D.6.1. 

Note that the cost of transmitting a new quantizer scale is six bits: one for the extra length of the 
macroblock type code, and five to define the value. Although this is normally a small fraction of the bits 
allocated to coding each macroblock, the encoder should exercise some restraint and avoid making a larce 
number of very small changes. 

D.6.3.3 DCT transform 



The DCT is illustrated in figure D.28. 



u, increasing 
horizontal frequency 



v, increasing 
vertical frequency 



(a) Pels 



(b) DCT Coefficients 



Figure D.28 - Transformation of pels to coefficients 

The pels are shown in faster scan order, whereas the coefficients are arranged in frequency order. The top left 
coefficient is the dc term and is proportional to the average value of the component pel values. The other 
coefficients are called ac coefficients. The ac coefficients to the right of the dc coefficient represent 
making horizontal frequencies; whereas ac coefficients below the dc coefficient represent increasing 
vertical frequencies. Hie remaining ac coefficients contain both horizontal and vertical frequency 
components. Note that an image containing only vertical lines contains only horizontal frequencies. 

The coefficient array contains all the information of the pel array and the pel array can be exactly 
reconstructed from the coefficient array, except for information lost by the use of finite arithmetic precision. 

The two-dimensional DCT is defined as 



f(u,v) = |j: 

4 x=0 



7 

X f (x,y) cos (tc(2x+1)u/16)cos (n(2y+l)v/16) 
F=0 
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with: u. v, x, y = 0, 1, 2, ... 7 

where x, y = spatial coordinates in the pel domain 

u, v = coordinates in the transform domain 
C(u) = l/V^foru^O 
C(v) = 1/V2forv = 0 

= 1 otherwise 

This transform is separable, i.e. a one-dimensional DCT transform may be applied first in the horizontal 
direction and then in the vertical direction. Hie formula for the one dim^ional transform is: 

i 7 

F(u) = - C(u) £ f(x)cos(*(2x+l)u/16) 
1 x=0 

C(u) = 1/V2 foru = 0 
= 1 otherwise 

Fast DCT transforms exist, analogous to fast Fourier transforms. See reference [3]. 

^input pel values have a range from 0 to 255, giving a dynamic range for the dc coefficient from 0 to 
1 U4U. IDe maximum dynamic range for any ac coefficient is about -1 000 to 1 (XX); Note that for P and 
B-pictures the component pels represent difference values and range from -255 to 255 This gives a 
maxrmum dynamic range for any coefficient of about -2 000 to 2 000. The encoder may thus represent the 
coefficients using 12 bits whose values range from -2 048 to 2 047. ^ 

D.6.3.4 Quantization 

Each array of 8 by 8 coefficients produced by the DCT transform operation is quantized to produce an 8 by 8 
array of quantized coefficients. Normally the number of non-zero quantized coefficients is quite small and 
this is one of the mam reasons why the compression scheme works as well as it does. 

The coefficients are quantized with a uniform quantizer. The characteristic of this quantizer only for I- 
blocks, is shown below: J 



Index 


f" ^ ^ Coefficent 







Figure D.29. - Uniform quantizer characteristics 

He value of the coefficient is divided by the quantizer step size and rounded to the nearest whole number to 
produce the quantized coefficient Half integer values may be rounded up or down without direcdy affecting 
image quality. However, rounding towards zero tends to give the smallest code size and so is preferred For 
example, with a step size of 16 all coefficients with values between 25 and 40 inclusive would give a 
quantized coefficient of 2. 6 
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Hie quantizer step size is derived from the quantization matrix and the quantizer scale. It can thus be 
different for different coefficients, and may change between macroblocks. The only exception is the dc 
coefficient which is treated differently. 

The eye is quite sensitive to large area luminance errors, and so the accuracy of coding the dc value is fixed. 
Tht quantizer step size for the dc coefficients of the luminance and chrominance components is fixed at 
eight Hie dc quantized coefficient is obtained by dividing the dc coefficient by eight and rounding to the 
nearest whole number. This effectively quantizes the average dc value to one part in 256 for the 
reconstructed pels. 

For example, a dc coefficient of 21 is quantized to a value of 3, independent of the value of the quantizer 
scale. 

The ac coefficients are quantized using the intra quantization matrix. The quantized coefficient i[u,v] is 
produced by quantizing the coefficient c[u,v] for I-blocks. One equation is given by the formula: 

i[u,vj = 8 * cfu,vj // (q * m[u,v]) 

where m[u,v] is the corresponding element of the intra quantization matrix, and q is the quantizer scale. The 
quantized coefficient is limited to the range -255 to +255. 

The intra quantization matrix might be the default matrix, or it might have been downloaded in the sequence 
header. 

D.6.3.5 Coding of quantized coefficients 

The top left coefficient in figure D.28b is called the dc coefficient, the remainder are called ac coefficients. 
The dc coefficient is correlated with the dc coefficient of the preceding block, and advantage is taken of this 
in coding. The ac coefficients are not well correlated, and are coded independently. 

After the dc coefficient of a block has been quantized it is coded losslessly by a DPCM technique. Coding 
of the luminance blocks within a macroblock follows the raster scan order of figure D.5, 0 to 3. Thus the 
dc value of block 3 becomes the dc predictor for block 0 of the following macroblock. The dc value of each 
chrominance block is coded using the dc value of the corresponding block of the previous macroblock as a 
predictor. At the beginning of each slice, all three dc predictors for Y, Cb and Cr, are set to 1 024 (128*8). 

The differential dc values thus generated are categorized according to their absolute value as shown in table 



Table D.12. - Differential dc size and VLC 



DIFFERENTIAL DC 


SIZE 


VLC CODE 


VLC CODE 


(absolute value) 




(luminance) 


(chrominance) 


0 


0 


100 


00 


1 


1 


00 


01 


2to3 


2 


01 


10 


4to7 


3 


101 


110 


8 to 15 


4 


110 


1110 


16 to 31 


5 


1110 


11110 


32 to 63 


6 


11110 


1111 10 


64 to 127 


7 


1111 10 


1111 110 


128 to 255 


8 


mi no 


1111 1110 



The size is transmitted using a VLC. This VLC is different for luminance and chrominance since the 
statistics are different 

The size defines the number of additional bits required to define the level uniquely. Thus a size of 6 is 
followed by 6 additional bits. These bits define the level in order, from low to high. Thus the first of these 
extra bits gives the sign: 0 for negative and 1 for positive. A size of zero requires no additional bits. 

The additional codes are given in table D.13. 
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Table D.13. - Differential dc additional code 



Differential dc 


SIZE 


-255 to -128 


8 


-127 to -64 


7 


-63 to -32 


6 


-31 to -16 


5 


-15 to -8 


4 


-7 to -4 


3 


3 to -2 


2 


-1 


1 


0 


0 


1 


1 


2to3 


2 


4to7 


3 


8 to 15 


4 


16 to 31 


5 


32 to 63 


6 


64 to 127 


7 


128 to 255 


8 



ADDITIONAL CODE 



00000000 to 01111111 
0000000 to 0111111 
000000 to 011111 
00000 to 01111 
0000 to 0111 
000 to 011 
00 to 01 
0 

1 

10 to 11 
100 to 111 
lOOOtollll 
10000 to 11111 
100000 to 111111 
1000000 to 1111111 
10000000 to 11111111 



Porexample .alu^nancedcchangeof 10 would be coded as 1101010. table D.12 shows that the first 
three bite 110 indicate that the size is 4. This means that four additional bits are required to define theexact 
value. The next bit is a 1, and table D.13 shows that the differential dc value must be somewhere between 
8 and 15 inclusive. The last three bits, 010, show that the exact value is 10. 

Tlie decoder reconstructs dc quantized coefficients by following the inverse procedure. 

The ac qu^u^ coefficients are coded using a run length and level technique. The quantized coefficients are 
first scanned in the zigzag order shown in figure D30. 



Increasing 
Vertical 
Frequency 



Increasing Horizontal Frequency 



1 "~5* 2 s 6 ^? 1 15 16 28 29 

/ / / 

3 5 8 14 17 27 30 43 

4 9 13 18 26 31 42 44 

10 12 19 25 32 41 45 54 

11 20 24 33 40 46 53 55 

21 23 34 39 4.7 52 56 61 

22 35 38 48 51 57 60 62 
36 37 49 50 58 59 63 64 



Figure D.30. - Quantized coefficient block in zigzag scan order 

Hie scanning order starts at 1, passes through 2, 3 etc in order, eventually reaching 64 in the bouom right 
corner. Tne length of a run is the number of zero quantized coefficients skipped over. Forexample the 
quanuzed coefficients mfig^ * 
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1 


0 


0 


0 


0 


0 


0 


0 


0 


2 


-3 


0 


0 


0 


0 


0 


0 


0 


4 


-5 


0 


0 


0 


0 


0 


0 


0 


1 


0 


0 


130 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 



Figure D31. — Example quantized coefficients 
Table D.14. - Example run lengths and levels 



RUN-LENGTH 


LEVEL 


! i 


2 


. 0 


4 


0 


-3 


3 


-5 


0 


1 


14 


130 


end 





The scan starts at position 2 since the top left quantized coefficient is coded separately as the dc quantized 
coefficient 

Using a zig zag scan rather than a raster scan is more efficient as it gives fewer runs and can be coded with 
shorter VLC codes. 

The list of run lengths and levels is coded using table. D. IS Not all possible combinations of run length 
and level are in these tables, only the more common ones. For combinations not in the tables, an escape 
sequence is used. In table D.15, the last bit V denotes the sign of the level; 0 means a positive level and 1 
means a negative level The escape code is used followed by the run length derived from table D.16 and then 
the level from table D.17. 
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Table D.15. « Combination codes for 
DOT quantized coefficients, s = 0 for 
positive level, s = 1 for negative level 



RUN 


LEVEL 


VLC CODE 


BOB 
o 


t 


i ft 

is lr 1st COEFF 


o 

V 


! 1 


118 INUl 1st COEFF 


ft 

V 


z 


A1AA mm 
U1UU S 


ft 




AA1 A 1 - 

UU1U is 


A 
V 


4 


AAAA t 1 A — 

UUUU 1 10s 


ft 


c 


0010 0110 8 


ft 


it 

0 


AA 1 A AAA 1 

0010 0001 s 


ft 


7 


0000 0010 10s 


A 

u 


8 


0000 0001 1101 s 


A 

u 


a 
9 


0000 0001 1000 s 


- ft 

u 


1 A 

10 


0000 0001 0011 s 


ft 

u 




0000 0001 0000 s 


ft 

u 


12 


0000 0000 1 101 0s 


ft 

o 


13 


0000 0000 1100 Is 


A 

0 


14 


0000 0000 1 100 0s 


A 
II 


15 


0000 0000 1011 Is 


A 
0 


16 


0000 0000 Oitl Us 


A 
0 


17 


0000 0000 0111 10s 


A 


18 


0000 0000 0111 01s 


A 


19 


0000 0000 0111 00s 


a 


20 


0000 0000 0110 lis 


0 


21 


0000 0000 0110 10s 


0 


22 


0000 0000 0110 01s 


0 


23 


0000 0000 0110 00s 


0 


24 


0000 0000 0101 Us 


0 


25 


0000 0000 0101 10s 


A 


26 


0000 0000 0101 01s 


0 


27 


0000 0000 0101 00s 


A 


28 


0000 0000 0100 Us 


A 

0 


29 


0000 0000 0100 10s 


0 


30 


0000 0000 0100 01s 


A 

0 


31 


0000 0000 0100 00s 


A 

0 


32 


0000 0000 0011 000s 


0 


33 


0000 0000 0010 I Us 


ft 

u 


34 


0000 0000 0010 UOs 


A 

u 


35 


0000 0000 0010 101s 


A 

u 


36 


0000 0000 0010 100s 


ft 

u 


37 


0000 0000 0010 Oils 


ft 

V 


1 o 
JO 


0000 0000 0010 010s 


ft 
u 


J9 


0000 0000 0010 001s 


ft 


it A 

40 


0000 0000 00 10 000s 




1 


A1 1 m, 

ui is 




<> 


AAA 1 I a_ 

UUUI 10s 






Aft 1 A ftlftl o 

wlU U1UI S 




4 


ftftftft ftA ] | AA. 

WvU UUll oos 




< 

j 


ftftftft AAA1 IAI I - 
WwU UUUI 1011 S 




6 


ftftftft ftftftft 1 ft f 1 A, 

WW uuuu lull US 




7 


0000 0000 1010 Is 




8 


0000 0000 0011 Ills 




9 


0000 0000 0011 110s 




10 


0000 0000 0011 101s 




11 


0000 0000 00 U 100s 




12 


0000 0000 0011 Oils 




13 


0000 0000 0011 010s 




14 


0000 0000 0011 001s 




15 


0000 0000 0001 OOUs 




16 


0000 0000 0001 0010s 




17 


0000 0000 0001 0001s 




18 


0000 0000 0001 0000s 



RUN 




VLC CODE 


2 


1 


0101 s 


2 


2 


0000 100s 


2 


3 


0000 0010 Us 


2 


. 4 


0000 0001 0100 s 


2 


5 


0000 0000 1010 0s 


3 


1 


00 U Is 


3 


2 


0010 0100 s 


3 


3 


0000 0001 1100 s 


3 


. 4 


0000 0000 1001 Is 


4 


1 


0011 0s 


4 


2 


0000 0011 Us 


4 


3 


0000 0001 0010 s 


5 


1 


0001 Us 


5 


2 


0000 0010 01s 


5 


3 


0000 0000 1001 0s 


6 


I 


0001 01s 


6 


2 


0000 0001 1110 s 


6 


3 


0000 0000 0001 0100s 


7 


I 


0001 00s . 


7 


2 


0000 0001 0101 s 


8 


1 


0000 Ills 


8 


2 


0000 0001 0001 s 


9 


1 


0000 101s 


9 


2 


0000 0000 1000 Is 


10 


I 


0010 0111 s 


10 


2 


0000 0000 1000 0s 


11 


1 


0010 0011 s 


11 


2 


0000 0000 0001 1010s 


12 


1 


0010 0010 s 


12 


2 


0000 0000 0001 1001s 


13 


1 


0010 0000 s 


13 


2 


0000 0000 0001 1000s 


14 


1 


0000 00 U 10s 


14 


2 


0000 0000 0001 01 Us 


! 15 


I 


0000 0011 01s 


15 


2 


0000 0000 0001 01 10s 


16 


1 


0000 0010 00s 


16 


2 


000 0000 0001 0101s 


17 


1 


oooo 0001 mi s 


18 


1 


0000 0001 1010 s 


19 


1 


0000 0001 1001 s 


20 


1 


0000 0001 0111 s 


21 


1 


0000 0001 OU0 s 


22 


1 


0000 0000 1111 Is 


23 


1 


oooo oooo mi os 


24 


I 


0000 0000 1110 Is 


25 


1 


0000 0000 1110 0s 


26 


1 


0000 0000 U01 Is 


27 


1 


0000 0000 0001 11 lis ' 


28 


1 


0000 0000 0001 1110s 


29 


I 


oooo oooo oooi nois 


30 




0000 0000 0001 11 00s 


31 


1 


0000 0000 0001 1011 s 


ESCAPE 




0000 01 
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Table D.16. - Zero run length codes 



RUN-LENGTH 


CODE 


0 


000000 


1 


000001 


2 


000010 


N 62 


1111 10 


63 


1111 11 



Table D.17. 



Level codes for DCT quantized coefficients 



LEVEL 


CODE 


-256 


FORBIDDEN 


-255 


10000000 00000001 


-254 


10000000 00000010 


-129 


1000 0000 0111 1111 


-128 


1000 000010000000 


-127 


1000 0001 


-126 


1000 0010 


-2 


1111 1110 


-1 


1111 1111 


0 


FORBIDDEN 


1 


0000 0001 


2 


0000 0010 


126 


bin mo 


127 


0111 1111 


128 


0000 000010000000 


129 


0000 0000 1000 0001 


254 


0000 00001111 1110 


255 


0000 00001111 1111 



Using tables D.15 through D.17 we car. derive the VLC codes for the example of table D.14: 
Table D.18. .. Example run lengths, values, and VLC codes 



RUN 


VALUE 


1 VLC CODE 




1 

0 
0 
3 
0 
14 
EOB 


I 2 
4 

-3 

-5 

.30 


10001 100 
00001100 
001011 

000001000011 1111 1011 
110 

0000 0100 1 1 10 0000 0000 1000 0010 
10 





There are two codes for the 0,1 run length, level combination, as indicated in table D 15 Intra block 
C^S? V*** C ° e . fGcient ' ^quantized ^aent, coded ££ Z7c s£ method 

Consequently mtra blocks always use the code 1 Is to denote a run length, level ccrobiLon of 01 TwUJ 
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teswi tote that predictively coded blocks code the dc quantized coefficient differently, and may use the 



D.6.4 Coding P-pictures 

^^flT"^ ?"P icture is divided up into one or more slices, which are, in turn, divided into 
™y^J?*l m J K morc com P ,ex than for I-pJctures. since motion-compensated macroblocks mav be 
Z be ^ een ^ 0,ion ^I^a^ macroblock and^renmaoobS 

transformed with a two^imensional DCT giving an array of 8 by 8 transform coefficients ^beooeffident, 

£ * e enC ° d ^ nee ? to store de «>*d P-picture since this may be used as the starting point 

for motion compensation. Therefore, the encoder will reconstruct the image from the quantized I coefficS 

^S^J^T^ theencoder . nas more decisions to make than in the case of lectures. These decisions 

*f_ T* "P mto • Bow » detennine motion vectors to use, decide whetheTS 
code each macroblock as intra or predicted, and how to set the quantizer scale. 

0.6.4.1 Slices in P-pictures 

int ° SliCCS to ^ ^ wa y 35 1-Pictures. The same considerations as to the best 
method of dividing a picture into slices apply, see D.5.4. 

D.6.4.2 Macroblocks in P-pictures 

Slices ; are ; divided.intc .macroblocks in the same way as for I-pictures. The major difference is the 
complexity introduced by motion compensation. 

™^^' ( !^ ead «^ may COntain Stuffmg - The P 055 " 0 " of the macroblock is determined by the 
n^block^dress. Whereas the macroblock address increment within a slice for I-pictures is restricted to 
SSr«,Si& Any macroblocks thus skipped over are called "skipped macroblock?. 
™h.v^T P IT ? em .^ Mn &e Prev'ous picture into the current picture. Skipped macroblocks are as 
predicted macroblocks with a zero motion vector for which no additional correction* available They 
require very few bits to transmit iucy 

The next field in the macroblock header defines the macroblock type. 
D.6.4.2. 1 Macroblock types in P-pictures 
There are eight types of macroblock in P-pictures: 

Table D.19 - Macroblock type VLC for P-pictures (table B.2b) 



TYPE 


I VLC 


INTRA 


pred-mc 


1 


0 


pred-c 


01 


0 


pred-m 


001 


0 


intra-d 


0001 1 


1 


pred-mcq 


00010 


0 


pretkxj 


00001 


0 


intra-q 


000001 


1 


skipped 


N/A 



MOTION 
FORWARD 



1 
0 
1 
0 

1 

0 
0 



CODED 
PATTERN 



1 
1 
0 
0 
1 
1 
0 



QUANT 



0 
0 
0 
0 
1 
1 
1 



Not all possible conations 0 f motion compensation, coding, quantization, and intra coding occur For 
example, with intracoded macroblocks, imnnl and intra-q, motion vectors are not transmitted 

Skipped macroblocks have no VLC code. Instead they are coded by having the macroblock address 
increment code skip over them. 
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D.6.4.2.2 Quantizer scale 



U the macroblock type is pred-mcq, pred-cq or intra-q, i.e. if the QUANT column in table D 19 has a 1 
then a quantizer scale is transmitted . If the macroblock types are prcd-mc, pred-c or intra-d, then the DCT 
correction is coded using the previously established value for the quantizer scale. 

D. 6.4.2. 3 Motion vectors 

n SET"??* " pred * mC 0r pred - mc i' i e - tf MOTION FORWARD column in table 

D.19 has a 1, then horizontal and vertical forward motion vectors are transmitted in succession.. 

D.6.4.2.4 Coded block pattern 

^^? WOd $ tyPe * Pred-C Pred-mc, pred-cq or pred-mcq, i.e. if the CODED PATTERN column in 
taWeD.19 has a 1, then a coded block pattern is transmitted. This informs the decoder which of the six 
blocks m the macroblock are coded, i.e. have transmitted DCT quantized coefficients, and which are not 
coded, i.e. have no additional correction after motion compensation. 

Tte coded block pattern is a number from 0 to 63 that indicates which of the blocks are coded, i.e. have at 
£££ ^ t fT t f ted r COefi ^f ** T d which 816 not «*«>■ To understand the structure of the coded block 
tfK N fiSK " ,t ^ dUCe variab,es PN to indicate ffie **» of each of the six blocks, 
defied by Itee^StionT 0ne ' tf U b not then PN is zero. The coded block pattern is 

CBP = 32*P0 + 16*P1 + 8*P2 + 4*P3 + 2*P4 + P5 
This is equivalent to the definition given in 2.4.3.6. 

For example, if the top two luminance blocks and the Cb block are coded, and the other three are not, then 
VU = 1, PI = 1, P2 = 0, P3 = 0, P4 = 1, and P5 = 0. The coded block pattern is: 

CBP = 32*1 + 16*1 + 8*0 + 4*0 + 2*1 + 0 = 50 

Sr!?f"^ Ue ?L are more common than others. Advantage is taken of this fart to increase the coding 

S™ ^' a VLC representin « °* "Xted block pattern, rather than the coded block pattern 
itself. The VLC codes are given in table D20. ^ 

Table D.20 -- VLC table for coded block pattern 



6oy in 

1101 



4 

8 

16 

32 
12 
48 
20 
40 
28 

52 
56\ 

1 
61 

2 
62 
24 
36 

3 
63 



1100 

1011 

1010 

1001 1 

10010 

10001 

1000 0 

0111 1 

OHIO 

01101 

01100 

0101 1 

01010 

01001 

01000 

0011 11 

0011 10 

001101 

001100 







10010111 


9 


0010110 


17 


0010 101 


33 


0010 100 


6 


0010011 


10 


0010 010 


18 


0010001 i 


34 


0010 000 ! 


7 


0001 1111 


! n 


0001 1110 


19 


0001 1101 


35 


0001 1100 


13 


0001 1011 


49 


0001 1010 


21 


0001 1001 


41 


0001 1000 


14 


0001 0111 


50 


00010110 | 


22 


0001 0101 


42 


0001 0100 


15 


0001 0011 



CBP 


1 VLC CODE 


51 


10001 0010 


23 


0001 0001 


43 


00010000 


25 


00001111 


37 


00001110 


26 


00001101 


38 


00001100 


29 


00001011 


45 


00001010 


53 


0000 1001 


57 


0000 1000 


30 


0000 0111 


46 


0000 0110 


54 


0000 0101 


58 


0000 0100 


31 


0000 0011 1 


47 


0000 00110 


55 


0000 00101 


59 


0000 0010 0 


27 


0000 0001 1 


39 


0000 0001 0 
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Thus the coded block pattern of the previous example, 50, would be represented by the code "00010110". 

Note that there is no code representing the state in which none of the blocks are coded, a coded block pattern 
equal to zero. Instead, this state is indicated by the macroblock type. 

For macroblocks in I-pictnres, and for intra coded macroblocks in P and B-pictures, the coded block nattem 
is not transmitted, but is assumed to have a value of 63, i.e. all the blocks in the macroblock are coded. 

ta^ f taiCSl Stead ° f tnmSmittln8 ^ ° f blOCk for b,ocks foUows the practice 
D.6.4.3 Selection of macroblock type 

An encoder has the difficult task of choosing between the different types of macroblocks. 

An exhaustive method is to try coding a macroblock to the same degree of accuracy using each type then 
choose the type that requires the least number of coding bits. ^ 

A simpler method, and one that is computationally less expensive, is to make a series of decisions One 
way to order these decisions is: 



motion compensation or no motion compensation, i.e. is a motion vector transmitted or 
. is it assumed to be zero. 

intra or non intra coding, i.e. is the macroblock type intra or is it predicted using the 
motion vector found in step 1. 

if the macroblock type is non-intra, is it coded or not coded, i.e. is the residual error large 

enough to be coded using the DCT transform. 

decide if the quantizer scale is satisfactory or should be changed. 



These decisions are summarized in figure D32. 



Begin 



ML 



Non-intra 



Coded 



Quant 



Pred-mcq 



Not coded 



No Quant Pred-m c 
Pred-m 



Intra 



Q»ant 



Intra-q 



No Quant Intra-d 



Coded 



No MC 



Non-Intrn 



OuM_ 
H No Quant 



Prerl-cq 



Not coded 



Skippe d 



Intra 



.Quant 
iNnQiinnt. 



Inlra-q 



Intra-d 



Figure D.32 - Selection of macroblock types in P-pictures 
The four decision steps are discussed in the next four clauses. 
D.6.4.3.1 Motion compensation decision 

The encoder has an option whether to transmit motion vectors or not for predictive-coded macroblocks If 
the motion vector is zero then some code may be saved by not transmitting the motion vectors. Thus one 
algorithm is to search for the best match and compare the error of the predicted block with that formed with 
a zero vector. If the motion-compensated block is only slightly belter than the uncompensated block using 
the selected block matching criterion, then the zero vector might be used to save coding bits. 
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U172 8 S?SSw! he deVe,0PmeiltOf ^ C ^ R «»ruiiendation H.261 and this part of ISO/IEC 



^^t'^^ n i C ^ aim B ^ sum ofabsolute differences of all the luminance pels in a macroblock 
when compared with the motion-compensated macroblock. If the sum is M for the modon-comMnTted 
S* ^ f ° r ^ 200 vector « *" of whether to make use of the modon JecT^dSd by 



figure D.33. 




Figure D.33 - Characteristic MC/No MC 

PMnte on die line dividing the No MC (no motion compensation, i.e. zero vector), from the MC (motion 
compensation) regions, are regarded as belonging to the no motion compensation region. 

iTJt^S^J^™ iS SufficienUy ,ow - ^ no motion compensation should be used. Thus a 
way to speed up the decision is to examine the zero vector first and decide if it is good enough. 

?^ 8 .° ing a,g0rithm ^as designed for telecommunications sequences in which the camera was fixed, and 
vJS ^ X ° f ****ff«ma caused by the "dmg^cmg effect" of nearby mo^SeS^ 

Ih^L S ^ H ^ ^ taken to reduce this spurious motion, and this accounts fof to cSnoT 
shape of me boundary between the two regions in figure D.33. 

D.6.4.3.2 Intra/non-intra coding decision 

On? 25 aTaSinTff^ f0r me enCOder ^ *>. and a fester algorithm may be required. 

One such i algorithm, used m the smmlation model during the development of this Dart of ISO/IEC imi 

macroblock and of the difference macroblock (current - motion-compensated previous) is comoared It k 

?^J££E2£2SF?V" °* fo,,owing c v^^™o*LT^&X 

variance of the difference macroblock, the average value is assumed to be zero. 
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intpdpflolUfl; 
intpelctignfl; 
long dif; 
long sum; 
long vard; 
long varc; 
int x,y; 



I* Pfel values in the Previous macroblock after motion compensation */ 

/* Pfel values in the Current macroblock */ 

I* Difference between two pel values */ 

/* Sum of the current pel values */ 

I* Variance of the Difference macroblock*/ 

/♦Variance of the Current macroblock */ 

/* coordinates */ 



sum = 0; 
vard=0; 
varc = 0; 

for(y=0;y<16;y++){ 

for (x=0;x<16;x++) { 

sum = sum + pelc[y][x]; 

varc= varc + (pelc[y][x]*pelc[yj[x)); 



} 



} 



dif=pelc[y][x].pelpty][x]; 
vard a vard + (dif*dif); 



vard = vard/256; t* assumes mean is close to zero */ 
varc = ( yarc/256 ) - ( (sum/256)*(sum/256) ); 

The decision as to whether to code as intra or non intra is then based on figure D.34. 




Figure D.34 - Characteristic intra/non-intra 
Poitt on the line dividing the non-intia from the intra regions, are regarded as belonging to the non-intra 

D.6.4.3.3 Coded/not coded decision 

The ^choice of coded or not coded is a result of quantization; when all coefficients are zero then a block is not 
coded. AinacroblockisnotcodedifnoW men a ojock is not 

D.6.4.3.4 Quantizer/no quantizer decision 
D.6.4.4 DCT transform 

S^ 0f T^. bl ° C ^. tranSf0nned fat0 quantized efficients in the same way that they were for 
SStm mSSSJSP^ ° f f 6 dc S 0efficiem differe . "^ever. The dc predicted Jues^eS 
set to 1 024 (128*8) for intra blocks in P and B-pictures, unless the previous block was intra coded. 

Coefficients of non-intra blocks are coded in a similar way. The main difference is that the coefficients to 
be transformed represent differences between pel values rather than the pel values themselves^ ThT 
duTerences are obtained by subtracting the modon-compensated pel values from the previous pioure from 
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the jelj^ues in the current macroblock. Since the coding is of differences, toere is no spatial prediction of 
D.6.4.5 Quantization of P-picturos 

Intra maooblocks in P and B-pictures are quantized using the same method as described for I-pictures. 

intcoefforig; /* original coefficient */ 

int coeffqant; /* quantized coefficient */ 

int coeffrec; * reconstructed coefficient */ 

int niqmatrix; /* non-intra quantization matrix */ 

int quantscale; /* quantizer scale */ 

coeffqant = (8 * coefforig) / (quantscale * niqmatrix); 
Hie process is illustrated below: 



niqmatrix 

quantscale 

coefforig 

coeffqant 

coeffiec 



16 
10 

-39-20 
-1 

-29 



16 
10 
-19-19 
0 
0 



16 
10 
20-39 
1 

29 



16 
10 
40-59 
2 
49 



16 
10 

60-79 
3 

69 



Ss^^SL^p Sft^^ COefficient values « ^ Allowing diagram shows the characteristics of 
this quantizer. Hie flat spot around zero gives this type of quantizer its name: a dead-zone quaS 



Quantized Coeff 










| ^ ^ Coefficent 















Figure D3S ~ Dead zone quantizer characteristic 
D.6.4.6 Coding of quantized coefficients 
D.6.4.6.1 Coding of intra blocks 

Se^reSoi? JXfT ""i 0 ^ ,h ^ SamC way 38 iDtra b,ocks in I-Pictures. Tne only difference lies in 
ttepredrotton of the dc coefficient The dc predicted value is 128. uniess the previous blcS ZsZto 
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D. 6. 4.6.2 Coding of non-intra blocks 

The coded block pattern is transmitted indicating which blocks have coefficient data. These are coded in a 
oSffSS 10 g ° f tolra W0Cks CXCept dc txxKK™ 1 is coded in the same way as the ac 

D.6.5 Coding B-pictures 

As in I and P-pictures, each B-picture is divided up into one or more slices, which are, in turn, divided into 
macroblocks. Coding is more complex than for P-pictures, since several types of motion compensated 
macroblocki may be constructed: forward, backward, and interpolated. The difference between demotion, 
compensated macroblock and the current macroblock is transformed with a two-dimensional DCT eivine an 
array of 8 by 8 muisfom coefficients. The coefficients are quantized to produce a set of quantized 
coefficients. The quantized coefficients are then encoded using a run-length value technique. 

The encoder does not need to store the decoded B-pictures since they will not be used for motion 
compensation. 

In coding B-pictures, the encoder has more decisions to make than in the case of P-pictures These 
decisions are: how to divide the picture up into slices, determine the best motion vectors to use, decide 
whether to use forward or backward or interpolated motion compensation or to code as intra, and how to set 
the quantizer scale. 

D.6.5.1 Slices in B-pictures 

B-pictures are divided into slices in the same way as I and P-pictures. Since B-pictures are not used as a 
reference for motion compensation, errors in B-pictures are slightly less important than in I or P-pictures 
Consequently, it might be appropriate to use fewer slices for B-pictures. 

D.6.5.2 Macroblocks in B-pictures 

Slices are divided into macroblocks in the same way as for I-pictures. 

Hie macroblock header may contain stuffing. The position of the macroblock is determined by the 
macroblock address. Whereas the macroblock address increment within a slice for I-pictures is restricted to 
one, it may be larger for B-pictures. Any macroblocks thus skipped over are called "skipped macroblocks" 
Skipped macroblocks in B-pictures differ from skipped macroblocks in P-pictures. Whereas in P-pictures 
skipped macroblocks have a motion vector equal to zero, in B-pictures skipped macroblocks have the same 
motion vector and the same macroblock type as the previous macroblock, which cannot be intra coded As 
there is no additional DCT correction, they require very few bits to transmit 

The next field in the macroblock header defines the macroblock type. 
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D.6.5.2.1 Macroblock types in B-pictures 

There are 12 types of macroblock in B-pictures: 

Table D.21 Macroblock type VLC for B-pictures (table B.2d) 



TYPE 


I VLC 


INTRA 


MOTION 


1 MOTION 


CODED 


1 QUANT 
1 








PDRWADH 


1 I> A Piaw inn 

I d AUK WARD 


PATTERN 


pred-i 


110 


0 


l 


1 


0 




pred-ic 


11 


0 


l 


1 
1 
1 
0 


1 

0 


1 0 


prod-b 


010 


0 


0 


0 


pred-bc 


on 


0 


0 


1 

0 


0 


pred-f 


0010 


0 


i 


0 


pred-fc 


0011 


0 


i 


0 
0 


1 
0 


0 


intra-d 


0001 1 


1 


0 


0 


pred-icq 


0001 0 


0 


1 


1 

0 


1 
1 


0 

1 


pred-foq 


000011 


0 


1 


pred-bcq 


000010 


o 1 


0 


1 


1 

0 


1 
1 
1 


intra-q 


000001 


1 


0 


0 


skipped 


N/A 











5^ J? P-PWtures, there are extra types due to the introduction of the backward motion vector If 
only a forward motion vector is present, then the motion compensated-macroblock is constructed from" a 
previous picture, as in P-pictures. If only a backward motion vector is present, then the inodon- 
compensated macroblock is constructed from a future picture. If both forward and backward motion vectors 
Z P ^Z U motion ^Pensated macroblocks are constructed bom both previous anTfiZJ pic^reT 
and the result is averaged to form the "interpolated" motion-compensated macroblock. P ^ 

D.6.5.2.2 Quantizer scale 

T^if T * P "?"i Cq ' P*** 5 * P red - bc( I' or intta ^' if the QUANT column in table D.21 
h^l tfien a quantizer scale is transmitted. For the remaining macroblock types, the DCT correction is 
coded using the previously established value for the quantizer scale. S 

D. 6.5.2. 3 Motion vectors 

v^SnJSS' ARD C0 ' Umn * taWe D 2X hasal ' lhen horizontal and vertical forward motion 
ESEiF tra / ,sm tte f' n Recession. If the MOTION BACKWARD column in table D.21 has a Ten 

baCkW3nl vcctos ™ emitted in succession. If both types arTp^nt men 
four component vectors are transmitted in the following order P ' 

horizontal forward 
vertical forward 
horizontal backward 
vertical backward 

D.6.5.2.4 Coded block pattern 

u^nn^^cod^S 0^ has a 1 • ^ a *ck pattern is transmitted. This 

lniorms we decoder which of the six blocks in the macroblock are coded i e have tramaniitwi nr-r 

5522^ md which m not «** u - tove no ^SS^^JSSSS 130 ' 

D.6.5.3 Selection of macroblock type 

c^Syt^sKha^ 1 ^ 10 ^ m B - piCtUreS ' m P ^ - 
SnTgSn waS? ^ deVC, ° Pment ° f m ° f IS0/IEC 11172 ' * e foUowfa « ^"endal 
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1: motion compensation mode, i.e. is forward or backward or interpolative motion 
compensation best? What of the vector values? ^ 

'SJ^SS^ 0 ^ i C ; is ^ macroblock tvpe intra or is it motion compensated 
using mode and the vectors found in step 1? ^ 

H^^fH^ J 1 *. 18 non - mtra - is * ^ « «* <«Ied, i.e. is the residual error large 
enough to be coded using the DCT transform. ^ 
4: decide if the quantizer scale is satisfactory or should be changed. 

These decisions are summarized in the following diagram: 





Forward MC 




Begin 


Backward MC 




Interpolated MC 



Quant Pred-*cq 



Non-lntra 


1 No Quant Pred-*c 






Not Coded p re d- # or Skipped 




Intra 


Quant | n tra-q 




1 No Quant Intra-d 



* = i,f.orb 



Figure D.3« - Selection of macroblock type in B-pictures 
The four decision steps are discussed in the next four clauses. 
D.6.5.3.1 Selecting motion-compensation mode 

t*£ f^ 1 " tocode B-pictures using skipped macroblocks if possible. This suggests that 
^ncoder should firs, examine me case where the motion compensation is me same as to SSoS 
maaoblock. If U» previous macroblock was non-intra, and if the motion-compensated block is £OOd 
enough, there mil be no additional DCT correction required and die block can bJcodSa^ sSp^ 

If the macroblock cannot be coded as skipped, then the following procedure may be followed. 

For the simulation model, the selection of a motion compensation mode for a macroblock was based on the 
nummization of a cost function. The cost function was me MSB of me liuninancIdSna beSn m? 
^T^T?"?* and *» current «naaoblock. The encoder J££ lES SST 
SSSSX^iSt ° f f - WSUd m0ti0n «>»P««ation. It then calculated the best moSnSmJensated 
macroblock for backward motion compensation by a similar method. Finally it av<™»SttSSS3Sr 
compensated macroblocks to produce the interposed macroblock. It ^TXcZZ & ^LT^ 
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D.6.5.3.2 Intra/non-intra coding decision 

D.6.5.3.3 Coded/not-coded decision 
D.6.5.4 OCT transform 

Cfeffidtntt of blocks are transformed into quantized coefficients in the same way that they are for Nodes in 

D.6.5.5 Quantization of B-pictures 

Blocks in B-pictures are quantized in the same way as for P-pictures. 

D.6.5.6 Coding quantized coefficients 

Blocks in B-pictures are coded the same way as blocks in P-pictures. 

D.6.6 Coding D-pictures 

^ftS^S^^ Mona ?*>»- ^ ™ ^nded to be used for fast visible search 

vSeo ** ,0W they contain is sufficient for the user tolccate the 

SSSS^^ Tnereisabittrar^ittedformernacroblocktype 
although only one macroblock type ex,sts. In addition there is a bit denoting end of macroModT 

D.6.7 Coding at lower picture rates 
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Table D.22 - Example of the coded data elements needed to generate 

repeated pictures * 




oooooooooooooooo 

0000 0001 0000 0000 
xxxx xxxxxx 
010 

xxxx xxxx xxxx xxxx 
0 

001 

0000 000 

oooooooooooooooo 

0000 0001 0000 0001 
00001 

1 

001 

0 

0 

00000001000 (x 11) 

0000 0011001 

001 

0 

0 

0000 



picture_stait_code 

tempOTal^reference 

picture_coding_type 

vbvjielay 

full^Lforward_code 
forwanJ_Lcode " 
stuffing 
slice_stait_code 

quantizer_scale 

macrobloc^address^iiKaiement 

macroblockjype 

motion jKwizontaI_forwardl_code 

motion_vertical _fonvanJ_code 

macrobloct_escape (xll) 

macrobla^addr^_incranent 
macrobloctjype 

motion Jiorizontal_fonvard_code 
mbtion_vertical_forwanj code 
stuffinc 



32 bits 

10 bits 
3 bits 

16 bits 
lbit 
3 bits 
7 bits 

32 bits 

5 bits 
lbit 
3 bits 
lbit 
lbit 
121 bits 
U bits 

3 bits 
lbit 
lbit 

4 bits 




D.7 Decoding MPEG video 

D.7.1 Decoding a sequence 
D.7.1.1 Decoding for forward playback 

P^eSTf^^ 

in Che sequence header. llTwSudeih^S^ff J Set up lts Parameters to match those defined 
rate, and the quantization m5r2 ^ aDd VCrtlCal resolutions ratio, the bit 

and read the vbv delay field Uti^Su^S^f,^? 1 p,Cturc headerin ^ « rou P of pictures 
the infonnation in the system s^^m^%Tn^ y ■^ 0Tm fT l 10 «£r than 

"-^ninedbythev^^^ 

adopton^fleve^sSSes^nWdS^ IhedecoderLiy 
pictures would be displayed L TlfcK^^E 1 T mK t™* *? toe ^ ^ ™<tecodable B- 
is likely that the broken Unk has «2ff f aUd '° ^^"atton and buffer fullness. However it 
discontinuous. /Tal^nSe^ ^ m which 

Picture until the buffeSStYuT tu? ^ ""^ ^ """"i *• *" 
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Subsequent pictures are processed at the appropriate times to avoid buffer overflow and underflow. 
D.7.1.2 Decoding for fast playback 

For example, a sequence might be coded as follows: 

IBPBPBPBPBIBPBPBPBPBI... 

reSS fW^ISK S ^ piCtUrc * Q ltat 6800 B -P icture «q«^ 03C, that eacbP-picture 
EJJS S^*' W men me I-pictures Sire tf&f the S KTlO% 

SSS5f^5£K B,g eXamp,e WCTe men OTe ^Picture would be transmitted emy 2.5 

ffone in N I-pictures of the preceding example were selected, then the speed up rate would be ION/2,5 = 4N 
D.7.1.3 Decoding for pause and step modes 

D.7.1.4 Decoding for reverse playback 

on the decoder in addition to 4 A bIems ta Sglc^^^S me 

Ulustrate me saving, consid^SwSg^K^^ 3 ^ T ° 
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n > ?5 BBPBBPBBP Pictures in display order 
0123 456789 10 11 temporal reference 

5 «? P BBPBBPBB Pictures in coded order 
2015 34867 11 9 10 temporal reference 

\ PP f, B n BBBBBBB Pictures in new order 
258 11 10 9764310 temporal reference 

Figure D.37 ~ Example group of pictures 

pie decoder would decode the pictures in the new order, and display them in the reverse of the normal 
display order. Smce the B-pjctures are not decoded until they are ready to be displayXmt SLSay Sr 
storage « nununued. The first two B-pictures, 0 and 1, would remain stored in tbi ^mpuTbuSufuS 
last P-picture m the previous group of pictures is decoded. 

D.8 Post processing 



D.8.1 



Editing 



SS« u ? ^^ qU ^° e iS beS ' performc<i ™ 0K compression, but situations arise where only the coded 
IS^rS??* ° ne P <5SSible memod wouM te to decode *e bitstream, perform the requLd eS 

ScSbSuSS. usuaUy ,eads t0 ■ ,oss in video quaIUy » and k * * ^ ^ 

Although editing may take several forms, the following discussion pertains only to editing at the oicture 
level: deletion of coded video material from a bitstream, insertion of coded video material mto ab&Sam 
or rearrangement of coded video material within a bitstream. onstream, 

m S ip ^ t g f P V*" fe P"* 1 ** 1 analogous to clip art for still pictures, then the video can 
b^^? WC \^ CUt ? ng ^ ^ cuttin 8 P° mts ^ P teces * ^cbttVe bitstrea^ my be 
£lX^Tr^ UUmg *° Sh ° UW ^ f ° U0Wed ^ 3 ^ ^ ° f "**""• ™ S ^ 

An editor must take care to ensure that the bitstream it produces is a legal bitstream In particular it mutt 
ensure that the new bitstream complies with the requirements of the video btifSg verified Tte I a 
J2? ^ m gtn f^^ m 001 te P 0 ^ 016 10 aether arbitrary sections of bi tstreams that 



Original 



Edited 



B 



Figure D.38 - Sequences 

It may however be possible to deliberately encode bitstreams in a manner that allows some editing to occur 
For instance if all Groups of Pictures bad the same number of pictures and were encoded wkh I?m 
number of b.ts, then many of the problems of complying with me video buffering verifier ^oul7b?SSved. 

The easiest editing task is to cut at the beginning of groups of pictures. If the group of pictures following 

,t„^hJ ^? n ' Wh '. Cb ^ d t te< *? by ** ciosed^opnagmmegreuj of pictures heS 

S 1 ed,tor ( m " st "» *" , bro ^ n - Unk bit to 1 «> i^icate to the decoder that me previous groupof 
pictures cannot be used for decoding any B-pictures. 6 H 
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D.8.2 Resampling 

m^b^eX^ 

D.8.2.1 Conversion of MPEG SIF to CCIR 601 format 

A S IF is converted to its corresponding CCIR 601 format by spatial upsampling. A linear ohase FIR 



I -12 I 0 I 14 0 | 256 I 140 I Q | -12 I 7/256 
Figure D.39 - Upsampling filter for luminance 
At the end of the lines some special technique, such as replicating the last pel, must be adopted. 
According to CCIR Rec. 601 the chrominance samples need to be co-sited with the luminance samples 1 
shown m f£ DM™ ** fi,te ' shouW b ™ » ^en number of taps^ 



ll 13 13 n~-)//4 

Figure D.40 - Upsampling filter for chrominance 

oSSmS ^ tn,Cted by , adding fourb,ack Pek toeach end of the horizontal luminance lines in the 
d«x^ bmnap, and two gray pels to each end of the horizontal chrominance lines. The luminance SIF 

^T Pled honzontal, y ^d vertically. The chrominance SIF should be upsan^eTonce 
horizontally and tw.ce vertically. This process is illustrated by the following diagram: 

720 



360 


240 


720 


240 
/288 




SIF 


/288 




CCIR 601 Y 


Horizontal 

Upsampling 

Filter 




Vertical 






Upsampling 
Filter 





(a) Luma 



360 



180 


120 


180 


240 


360 


240 
/288 




SIF 


/144 




/288 




CCIR 601 




Vertical 

Upsampling 

Filter 




Horizontal 




Vertical 


u.v 






Upsampling 
Filter 




Upsampling 
Filter 





480 
/576 



(b) Chroma 

Figure D.41 - Simplified decoder block diagram 
D.8.2.2 Temporal resampling 

m^ln^^ 8 ™ "T^ 10 ^ used to "» tele vision industry, the same techniques 

puiSZ c on 24 pictures/s t0 60 fieIds/s b * *e 



480 

7576 



106 



Exhibit 18, page 176 



©ISO/IEC 

ISO/IEC 11172-2: 1993(E) 
may be converted to a field rate twice as large using the same method 31 or ^ Pictures/s 

»y increasing to pilch. o, b, speMtoj „ up SoSS " 'V**™****. e»to 



Exhibit 18, page 177 



107 



ISO/IEC 11172-2: 1993(E) 



Annex E 

(informative) 
Bibliography 



© ISO/IEC 



[1J Anm N. Netravali & Bany G. Haskell Digital Pictures, representation and compression Plenum 
Press, 1988. 

121 Aprin991 GaI1 MPEa A YXde ° Compression Standard for Multimedia Applications Trans ACM, 

[3] C Loefiler, A Ligtenberg, G S Moscbytz Practical fast ID DCT algorithms with 1 1 
multiplications Proceedings IEEE ICASSP-89, Vol. 2, pp 988-991, Feb. 1989. 

[4] IEC Standard Publication 461, Second edition 1986 Time and control code for video tape recorders, 

15] CCnT Recommendation H.261 Codec for audiovisual services at px64 kbit/s Geneva, 1990. 

161 ISO/IEC DIS 10918-1 Digital compression and coding of continuous-tone still images -Parti- 
Requirements and guidelines. 

m E Viscito and CGonzales A Video Compression Algorithm with Adaptive Bit Allocation and 

G^iz^^ ^ SPIE Visual Communications and Image Proc *91 Boston MA November 10- 
Id vol loUj 205, 1991. 

I8] ^^i?" 11 R A ^. vind MolU > n Compensated Video Coding with Adaptive Perceptual Quantization. 
WEE Trans on Circuits and Systems for Video Technology, Vol 1 pp 351 Dec 1991. 



108 



Exhibit 18, page 178 



© ISO/IEC 



ISO/IEC 11172-2: 1993(E) 



Annex F 

(informative) 

List of patent holders 



such rights that thev hnlH nn ZlLL^f ? (n J F) . a s . tatemenl of willingness to grant a license under 
to obSiTaSn^ ° ab,C ^ n ° ndlSCnm,nator y tenns «» ^itions to applicants desiring 



Information regarding such patents can be obtained from : 



AT&T 

32 Avenue of the Americas 
New York 
NY 10013-2412 
USA 



Aware 

1 Memorial Drive 

Cambridge 

02142 Massachusetts 

USA 



Bellcore 

290 W Mount Pleasant Avenue 

Livingston 

NJ 07039 

USA 

The British Broadcasting Corporation 

Broadcasting House 

London 

W1A 1AA 

United Kingdom 

British Telecommunications pic 

Intellectual Property Unit 

13th Floor 

151 Gower Street 

London 

WC1E6BA 

United Kingdom 

CCETT 

4 Rue du Clos-Courtel 

BP 59 

F-35512 

Cesson-SevigneCedex 
France 
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CNET 

38^40 Rue du General Leclerc 
F-92131 Issy-les-Moulineaux 
France 

Compression Labs, Incorporated 

2860 Junction Avenue 

San Jose 

CA 95134 

USA 

CSELT 

Via G Reiss Romoli 274 

1-10148 Torino 

Italy 

CompuSonics Corporation 

PO Box 61017 

Palo Alto 

CA 94306 

USA 

Daimler Benz AG 
PO Box 800 230 
Epplestrasse 225 
D-7000 Stuttgart 80 
Germany 

DomierGmbh 

An der Bundesstrasse 31 

D-7990 Friedrichshafenl 

Germany 

Fraunhofer Gesselschaft zur Foerderung der Angerwandten Forschung e V 

Leonrodstrasse 54 

8000Muencbenl9 

Germany 

Hitachi Ltd 

6 Kanda-Surugadai 4 chome 

Cbiyoda-ku 

Tokyo 101 

Japan 

Institut fOr Rundfunktechnik Gmbh 
Florianmiihlstrafie 60 
8000MOnchen 45 
Germany 

International Business Machines Corporation 
Armonk 

New York 10504 
USA 

KDD Corporation 

2-3-2 Nishishinjuku 

Shlnjuku-ku 

Tokyo 

Japan 
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Licentia Patent- Verwaltungs-Gmbh 
Theodor-Stem-Kai & 
D-6000 Frankfurt 70 
Germany 

Massachusetts Institute of Technology 

20 Ames Street 

Cambridge 

Massachusetts 02139 

USA 

Matsushita Electric Industrial Co. Ltd 

1006Oaza-Kadoma 

Kadooia 

Osaka 571 

Japan 

Mitsubishi Electric Corporation 

2-3 Marunouchi 

2-Chome 

Chiyoda-Ku 

Tokyo 

100 Japan 

NEC Corporation 

7-lSbiba5-Chome 

Minato-ku 

Tokyo 

Japan 

Nippon Hoso Kyokai 
2-2-1 Jin-nan 
Shibuya-ku 
Tokyo 150-01 
Japan 

Philips Electronics NV 
Groenewoudseweg 1 
5621 B A Eindhoven 
The Netherlands 

Pioneer Electronic Corporation 

4-1 Meguro 1-Chome 

Meguro-ku 

Tokyo 153 

Japan 

Ricoh Co, Ltd 
1-3-6 Nakamagome 
Ohta-ku 
Tokyo 143 
Japan 

Schawartz Engineering & Design 
15 Buckland Court 
San Carlos, CA 94070 
USA 



Exhibit 18, page 181 



111 



ISO/IEC 11172-2: 1993 (E) 



© ISO/IEC 



Sony Corporation 
6-7-35 Kitashinagawa 
Shinagawa-ku 
Tokyo 141 
Japan 

Symbionics 

St John's Innovation Centre 
Cowley Road 
Cambridge 
CB4 4WS 
United Kingdom 

Telefunken Fernseh und Rundfunk GmbH 
Gottinger Chaussee 
D-3000 Hannover 91 
Germany 

Thomson Consumer Electronics 

9, Place dcs Vosges 

La Defense 5 

92400 Courbevoie 

Fiance 

Toppan Printing Co, Ltd 

1-5-1 Taito 

Taito-ku 

Tokyo 110 

Japan 

Toshiba Corporation 
1-1 Shibam 1-Chome 
Minato-ku 
Tokyo 105 
Japan 

Victor Company of Japan Ltd 

12 Moriya-cho 3 chome 

Kanagawa-ku 

Yokohama 

Kanagawa221 

Japan 
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